Electron App Automation with agent-browser and CDP
Guide to automating Electron applications using Chrome DevTools Protocol and agent-browser for UI control and interaction.
Automate Any Electron App with agent-browser and CDP: A Developer's Guide
Slack, VS Code, Discord, Notion, Figma, Spotify, Obsidian — what do these wildly different applications have in common? They are all built on Electron, which means they are all, underneath the hood, running Chrome. And that means you can automate every single one of them using the Chrome DevTools Protocol (CDP) and a tool called agent-browser.
This post covers how the technique works, how to set it up, and where it fits into real-world AI automation workflows.
Why Electron Apps Are Secretly Just Chrome
Electron wraps a Chromium browser engine inside a native desktop shell. From the operating system's perspective, you launched a desktop app. From a developer's perspective, you launched a browser tab with elevated system permissions.
This has an important implication: Chromium ships with a built-in remote debugging interface called the Chrome DevTools Protocol (CDP). CDP is the same protocol your browser's DevTools uses to inspect elements, capture network traffic, and run JavaScript. Electron applications expose this interface too — they just don't advertise it.
To activate it, you simply pass --remote-debugging-port as a launch argument:
# Launch Slack with CDP enabled on port 9222
open -a "Slack" --args --remote-debugging-port=9222
# Launch VS Code on a separate port to avoid conflicts
open -a "Visual Studio Code" --args --remote-debugging-port=9223
Once the application is running with a debugging port open, anything that speaks CDP can connect to it and take full control of the UI.
agent-browser: AI-Native Electron Automation
agent-browser is an open-source tool from Vercel Labs designed to give AI agents — particularly Claude — a structured way to interact with browsers and browser-like environments. It speaks CDP natively, which makes it an ideal driver for Electron automation.
After launching your target application with a remote debugging port, you connect agent-browser to it:
agent-browser connect 9222
From that point, the full suite of browser automation primitives becomes available:
# Get a structured snapshot of all UI elements on screen
agent-browser snapshot -i
# Click an element by its reference ID
agent-browser click @e5
# Fill a text field
agent-browser fill @e12 "Hello from automation"
# Take a screenshot
agent-browser screenshot
The snapshot command is particularly useful for AI-driven workflows. It returns a structured representation of the application's UI — not a raw screenshot, but a machine-readable tree of interactive elements. An AI agent can reason over this structure, identify the right elements, and chain together interactions without any hardcoded selectors.
Official Skills from Vercel Labs
The agent-browser repository ships with pre-built skills — structured instruction files that teach an AI agent how to handle a specific application or task type:
- Electron Skill (
agent-browser/skills/electron/SKILL.md): Generic guidance for launching and controlling any Electron application via CDP. - Slack Skill (
agent-browser/skills/slack/SKILL.md): Application-specific automation patterns for Slack, covering message sending, channel navigation, and monitoring.
These skills are designed to be loaded directly into Claude Code or similar agentic coding environments, giving the agent the context it needs to operate the application reliably without manual intervention.
Real-World Automation Use Cases
The combination of CDP, Electron, and an AI agent capable of reading UI snapshots unlocks a category of automation that was previously difficult or impossible without official API support.
Customer Support Automation via WhatsApp Desktop
WhatsApp Desktop is an Electron application. Most businesses using WhatsApp for customer communication have no programmatic access — the official Business API is expensive and restricted. With agent-browser and CDP, you can build an automation layer directly on top of the desktop client:
- Monitor incoming messages by polling the UI snapshot
- Route messages to the appropriate team member or trigger a response workflow
- Send templated replies based on keywords or classification logic
The session stays authenticated because you are operating the actual logged-in desktop application, not an API token that needs rotation.
VS Code Workflow Automation
VS Code exposes a rich UI through CDP. Agentic workflows can:
- Open files, navigate the file tree, and switch tabs programmatically
- Trigger built-in commands through the Command Palette
- Read terminal output and respond to build errors
- Automate repetitive editing patterns across large codebases
This complements VS Code's extension API without replacing it — CDP-based automation can drive the editor the same way a human would, which means it works even when no extension exists for a particular task.
Slack Monitoring and Messaging
Slack's official API is powerful but rate-limited and requires OAuth setup. The desktop app has no such restrictions from the user's perspective. With the Slack skill from agent-browser, you can automate:
- Sending scheduled messages or reports to specific channels
- Monitoring channels for keywords and triggering downstream actions
- Cross-posting content between channels based on rules
- Extracting thread summaries for async teams
Notion and Obsidian Knowledge Management
Notion's Electron client and Obsidian (also Electron-based) can be driven through CDP to automate note creation, database updates, and content organization without relying on Notion's API quota limits or Obsidian's plugin ecosystem.
Key Technical Advantages Over Traditional API Approaches
No API required. Many Electron applications either have no public API, have a restricted or expensive API, or gate automation behind developer accounts. CDP bypasses this entirely — if a human can use the app, your agent can too.
Session persistence. You operate inside the user's active, authenticated session. There is no token management, no OAuth dance, no re-authentication. The app thinks a human is using it.
Full UI control. You are not constrained by what an API exposes. If a button exists in the UI, you can click it. If a field exists, you can fill it. The automation surface is the entire application.
AI-readable structure. Unlike raw screenshot-based automation (which requires vision models to interpret pixel positions), agent-browser's snapshot command returns structured element data. This is faster, more reliable, and works cleanly with language models that reason over structured text.
Getting Started
- Install agent-browser from agent-browser.dev or clone the Vercel Labs repository.
- Launch your target Electron application with
--remote-debugging-port=<port>. - Run
agent-browser connect <port>to establish a CDP session. - Use
agent-browser snapshot -ito inspect available UI elements. - Build your automation sequence using
click,fill,type, andscreenshotcommands. - For Claude Code integration, load the relevant skill file from
agent-browser/skills/into your agent context.
Conclusion
The insight that Electron applications are just Chrome in disguise is not new, but pairing it with an AI-native automation tool like agent-browser creates something genuinely useful: a general-purpose automation layer for virtually every popular desktop productivity application, with no API contracts required.
For developers building AI agents, support automation systems, or internal tooling, this approach deserves serious attention. The combination of CDP's maturity, Electron's ubiquity, and agent-browser's structured UI interface represents one of the cleaner paths to desktop automation available today — and it works with the applications your users are already running.
Tags
Related Articles
Vercel's React Best Practices as Reusable Skill
Vercel distilled 10 years of React expertise into a skill, demonstrating how organizations should package internal best practices as reusable AI agent skills.
Building Commercial Apps with Claude Opus
Experience sharing on rapid app development using Claude Opus as a CTO, product manager, and designer combined.
AI-Powered Product Marketing with Video and Social Media
Guide on using AI to create product advertisement videos, user testimonials, and product images for social media marketing campaigns.