A Chrome extension that turns your browser into an AI agent. Type a command in plain language and the agent clicks, types, navigates, reads pages, and finishes the task for you.
New to Crab-Agent? Start with the guides below.
| Guide | Description |
|---|---|
| Getting Started | Installation, setup, and your first task |
| Use Cases | Copy-paste examples for search, email, forms, scraping, and more |
| Workflows | Record and replay browser actions like macros |
| Memory and Rules | How Crab learns and remembers across sessions |
| Scheduling | Timers, reminders, and recurring tasks |
| Canvas and Design | Posters, generative art, and Figma or Miro workflows |
| Providers | Choose and set up your AI provider |
| Tips and Troubleshooting | Best practices, common errors, and fixes |
- Download or clone this repository.
- Open
chrome://extensionsand enable Developer mode. - Click Load unpacked and select the repository folder.
- Open any website, then click the Crab-Agent icon to open the side panel.
- Go to Settings and enter your chosen provider's API key, or pick ChatGPT (Sign in) to authenticate with OAuth.
Open the side panel on any page, type what you want done, and send. The agent will:
- Take a screenshot of the active tab.
- Read the page DOM and accessibility tree.
- Call the LLM with the screenshot and page context.
- Execute the chosen action (click, type, scroll, navigate).
- Repeat until the task is complete.
| Mode | Description |
|---|---|
| Ask (default) | Prompts before acting on a new domain |
| Auto | Runs automatically without prompting |
| Strict | Asks before every action |
Record a sequence of browser actions, then replay it with different parameters.
- Click record, perform the actions, click stop, save the workflow.
- When you enter a related command later, the agent matches and runs the saved workflow.
- The AI auto-detects which values should be parameters (names, emails, URLs) so the same recording works with different inputs.
See Workflows for the full guide.
The agent remembers information you share across sessions (name, preferences, site-specific rules). After several sessions it cleans up duplicates automatically through a process called Dream Consolidation.
See Memory and Rules for the full guide.
Schedule tasks to run once at a future time, or on a recurring cron schedule. Tasks run headlessly through Chrome alarms and survive browser restarts.
See Scheduling for the full guide.
Enable Quick Mode in Settings for lower-latency execution on simple tasks. The agent uses compact text commands instead of structured tool calls. Best for quick navigation, clicks, and short lookups.
| Provider | Notes |
|---|---|
| Anthropic | Best tool-use accuracy. Recommend Claude Sonnet 4.6 or higher |
| OpenAI | Strong general performance with GPT-4o or o3 |
| Google Gemini | Good balance of speed and quality |
| OpenRouter | One key, many models (Claude, GPT, Gemini, Llama, and more) |
| Ollama | Local and free; full privacy |
| OpenAI-compatible | Any endpoint that follows the OpenAI chat completions format |
| ChatGPT (Sign in) | OAuth to your ChatGPT account; no API key needed |
See Providers for setup instructions and model recommendations.
Crab-Agent is extensively tested with Claude Sonnet 4.6, Claude Opus 4.x and Gemini-3.5-Flash,. Top-tier models give the fewest hallucinated tool calls, the strongest multi-step planning, and the best handling of edge cases.
Smaller or older models can still work for simple tasks but tend to make more mistakes on long sequences.
The agent picks the right tool at each step. Categories below.
| Category | Tools |
|---|---|
| Browser | click, type, scroll, drag, navigate, back, forward, create or switch or close tab |
| Page | read DOM, find element, extract text, read console, inspect network, fill form |
| File | upload, download, image upload |
| Advanced | JavaScript execution, canvas toolkit, code editor, document generator (DOCX, HTML), GIF recorder, SVG visualizer |
| Agent | memory CRUD, rule suggestion, plan updates, task scheduling, workflow execution |
There are 30+ tools in total. See the developer docs in the source repository for the full list.
| Layer | Technology |
|---|---|
| UI | React 18 |
| Language | TypeScript |
| Build | Vite with @crxjs/vite-plugin |
| Styling | Tailwind CSS |
| State | Zustand |
| Platform | Chrome MV3 |
- Clawd Tank by Marcio Granzotto for the crab mascot pixel art and SVG animations.
- Claude Computer Use by Anthropic for the agent loop concept (screenshot, observe, decide, act).
MIT. Built by Hert4.