Universal Perception Layer · v0.1 · Open Source

Give agents
eyes on any
screen.

PerceptAI is not an AI assistant. It is open-source infrastructure — the perception layer developers embed into their own agent products.
Any screen. Any app. Zero DOM required.

Get Started → See How It Works

natural_language_demo.py

# One instruction. Full execution.

→ Instruction: "open chrome and search for AI agents"

[1/3] Scanning environment...

[2/3] 47 elements detected

[3/3] Planning with AI...

Step 1: Opening chrome...

Step 2: Navigating to google.com...

Step 3: Typing query...

Step 4: Pressing enter...

✓ TASK COMPLETED SUCCESSFULLY

Any Screen

Natural Language

Autonomous Execution

Desktop Apps

Legacy Software

Zero DOM Required

Open Source

Free to Use

Any Screen

Natural Language

Autonomous Execution

Desktop Apps

Legacy Software

Zero DOM Required

Open Source

Free to Use

How It Works

01 / PERCEIVE

👁

See Any Screen

EasyOCR extracts every text element with real pixel coordinates. Groq Vision understands the UI structure. Together they build a complete world model of any interface.

02 / PLAN

🧠

Plan The Steps

Groq LLaMA 3.3 converts your plain English instruction into precise executable steps. Open apps. Navigate URLs. Click elements. Type text. All planned automatically.

03 / ACT

⚡

Execute Precisely

PyAutoGUI executes actions on the real screen with coordinate precision. Clicks, keyboard input, scrolling, hotkeys — everything a human can do, your agent can do.

The Difference

Other Browser Agents

✗ Websites only via DOM access

✗ Breaks when DOM changes

✗ Can't touch desktop apps

✗ Needs API or web access

✗ Requires cloud API or sandbox

✗ Legacy software is invisible

PerceptAI

✓ Any app with pixels on screen

✓ Vision-based — no DOM needed

✓ Full desktop OS control

✓ Works fully offline

✓ Runs locally — zero API lock-in

✓ Legacy software works perfectly

Quick Start

bash

# Clone and setup
git clone https://github.com/Neeraj04-CY/PerceptAi
cd PerceptAi
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

# Add your free Groq API key
echo GROQ_API_KEY=your_key > .env

# Run
python examples/natural_language_demo.py

python

from core.perception import perceive
from core.planner import plan_task
from core.agent import PerceptAgent

# Plain English. Full execution.
instruction = "open notepad and write my meeting notes"

screen = perceive()
plan = plan_task(instruction, context)
agent = PerceptAgent(plan["task"])
agent.run(plan["steps"])

The Stack

Groq LLaVA

Vision understanding

FREE

LLaMA 3.3

Task planning

FREE

EasyOCR

Text + coordinates

LOCAL

PyAutoGUI

Action execution

LOCAL

Python ctypes

OS & window control

NATIVE

Start building
today.

Free. Open source. No credit card. No cloud required.

View on GitHub → Get Free API Key

Give agents eyes on any screen.

Start buildingtoday.

Give agents
eyes on any
screen.

Start building
today.