Universal Perception Layer · v0.1 · Open Source

Give agents
eyes on any
screen.

PerceptAI is not an AI assistant. It is open-source infrastructure — the perception layer developers embed into their own agent products.
Any screen. Any app. Zero DOM required.

Get Started → See How It Works
natural_language_demo.py
# One instruction. Full execution.
 
→ Instruction: "open chrome and search for AI agents"
 
[1/3] Scanning environment...
[2/3] 47 elements detected
[3/3] Planning with AI...
 
Step 1: Opening chrome...
Step 2: Navigating to google.com...
Step 3: Typing query...
Step 4: Pressing enter...
 
✓ TASK COMPLETED SUCCESSFULLY
Any Screen
Natural Language
Autonomous Execution
Desktop Apps
Legacy Software
Zero DOM Required
Open Source
Free to Use
Any Screen
Natural Language
Autonomous Execution
Desktop Apps
Legacy Software
Zero DOM Required
Open Source
Free to Use
How It Works
01 / PERCEIVE
👁
See Any Screen

EasyOCR extracts every text element with real pixel coordinates. Groq Vision understands the UI structure. Together they build a complete world model of any interface.

02 / PLAN
🧠
Plan The Steps

Groq LLaMA 3.3 converts your plain English instruction into precise executable steps. Open apps. Navigate URLs. Click elements. Type text. All planned automatically.

03 / ACT
Execute Precisely

PyAutoGUI executes actions on the real screen with coordinate precision. Clicks, keyboard input, scrolling, hotkeys — everything a human can do, your agent can do.

The Difference
Other Browser Agents
Websites only via DOM access
Breaks when DOM changes
Can't touch desktop apps
Needs API or web access
Requires cloud API or sandbox
Legacy software is invisible
VS
PerceptAI
Any app with pixels on screen
Vision-based — no DOM needed
Full desktop OS control
Works fully offline
Runs locally — zero API lock-in
Legacy software works perfectly
Quick Start
bash
# Clone and setup
git clone https://github.com/Neeraj04-CY/PerceptAi
cd PerceptAi
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

# Add your free Groq API key
echo GROQ_API_KEY=your_key > .env

# Run
python examples/natural_language_demo.py
python
from core.perception import perceive
from core.planner import plan_task
from core.agent import PerceptAgent

# Plain English. Full execution.
instruction = "open notepad and write my meeting notes"

screen = perceive()
plan = plan_task(instruction, context)
agent = PerceptAgent(plan["task"])
agent.run(plan["steps"])
The Stack
Groq LLaVA
Vision understanding
FREE
LLaMA 3.3
Task planning
FREE
EasyOCR
Text + coordinates
LOCAL
PyAutoGUI
Action execution
LOCAL
Python ctypes
OS & window control
NATIVE

Start building
today.

Free. Open source. No credit card. No cloud required.

View on GitHub → Get Free API Key