PerceptAI is not an AI assistant. It is open-source infrastructure — the perception layer developers embed into their own agent products.
Any screen. Any app. Zero DOM required.
EasyOCR extracts every text element with real pixel coordinates. Groq Vision understands the UI structure. Together they build a complete world model of any interface.
Groq LLaMA 3.3 converts your plain English instruction into precise executable steps. Open apps. Navigate URLs. Click elements. Type text. All planned automatically.
PyAutoGUI executes actions on the real screen with coordinate precision. Clicks, keyboard input, scrolling, hotkeys — everything a human can do, your agent can do.
# Clone and setup git clone https://github.com/Neeraj04-CY/PerceptAi cd PerceptAi python -m venv .venv .venv\Scripts\activate pip install -r requirements.txt # Add your free Groq API key echo GROQ_API_KEY=your_key > .env # Run python examples/natural_language_demo.py
from core.perception import perceive from core.planner import plan_task from core.agent import PerceptAgent # Plain English. Full execution. instruction = "open notepad and write my meeting notes" screen = perceive() plan = plan_task(instruction, context) agent = PerceptAgent(plan["task"]) agent.run(plan["steps"])
Free. Open source. No credit card. No cloud required.