Automate Any Electron App with agent-browser and CDP: A Developer's Guide

Slack, VS Code, Discord, Notion, Figma, Spotify, Obsidian — what do these wildly different applications have in common? They are all built on Electron, which means they are all, underneath the hood, running Chrome. And that means you can automate every single one of them using the Chrome DevTools Protocol (CDP) and a tool called agent-browser.

This post covers how the technique works, how to set it up, and where it fits into real-world AI automation workflows.

Why Electron Apps Are Secretly Just Chrome

Electron wraps a Chromium browser engine inside a native desktop shell. From the operating system's perspective, you launched a desktop app. From a developer's perspective, you launched a browser tab with elevated system permissions.

This has an important implication: Chromium ships with a built-in remote debugging interface called the Chrome DevTools Protocol (CDP). CDP is the same protocol your browser's DevTools uses to inspect elements, capture network traffic, and run JavaScript. Electron applications expose this interface too — they just don't advertise it.

To activate it, you simply pass --remote-debugging-port as a launch argument:

# Launch Slack with CDP enabled on port 9222
open -a "Slack" --args --remote-debugging-port=9222

# Launch VS Code on a separate port to avoid conflicts
open -a "Visual Studio Code" --args --remote-debugging-port=9223

Once the application is running with a debugging port open, anything that speaks CDP can connect to it and take full control of the UI.

agent-browser: AI-Native Electron Automation

agent-browser is an open-source tool from Vercel Labs designed to give AI agents — particularly Claude — a structured way to interact with browsers and browser-like environments. It speaks CDP natively, which makes it an ideal driver for Electron automation.

After launching your target application with a remote debugging port, you connect agent-browser to it:

agent-browser connect 9222

From that point, the full suite of browser automation primitives becomes available:

# Get a structured snapshot of all UI elements on screen
agent-browser snapshot -i

# Click an element by its reference ID
agent-browser click @e5

# Fill a text field
agent-browser fill @e12 "Hello from automation"

# Take a screenshot
agent-browser screenshot

The snapshot command is particularly useful for AI-driven workflows. It returns a structured representation of the application's UI — not a raw screenshot, but a machine-readable tree of interactive elements. An AI agent can reason over this structure, identify the right elements, and chain together interactions without any hardcoded selectors.

Official Skills from Vercel Labs

The agent-browser repository ships with pre-built skills — structured instruction files that teach an AI agent how to handle a specific application or task type:

Electron Skill (agent-browser/skills/electron/SKILL.md): Generic guidance for launching and controlling any Electron application via CDP.
Slack Skill (agent-browser/skills/slack/SKILL.md): Application-specific automation patterns for Slack, covering message sending, channel navigation, and monitoring.

These skills are designed to be loaded directly into Claude Code or similar agentic coding environments, giving the agent the context it needs to operate the application reliably without manual intervention.

Real-World Automation Use Cases

The combination of CDP, Electron, and an AI agent capable of reading UI snapshots unlocks a category of automation that was previously difficult or impossible without official API support.

Customer Support Automation via WhatsApp Desktop

WhatsApp Desktop is an Electron application. Most businesses using WhatsApp for customer communication have no programmatic access — the official Business API is expensive and restricted. With agent-browser and CDP, you can build an automation layer directly on top of the desktop client:

Monitor incoming messages by polling the UI snapshot
Route messages to the appropriate team member or trigger a response workflow
Send templated replies based on keywords or classification logic

The session stays authenticated because you are operating the actual logged-in desktop application, not an API token that needs rotation.

VS Code Workflow Automation

VS Code exposes a rich UI through CDP. Agentic workflows can:

Open files, navigate the file tree, and switch tabs programmatically
Trigger built-in commands through the Command Palette
Read terminal output and respond to build errors
Automate repetitive editing patterns across large codebases

This complements VS Code's extension API without replacing it — CDP-based automation can drive the editor the same way a human would, which means it works even when no extension exists for a particular task.

Slack Monitoring and Messaging

Slack's official API is powerful but rate-limited and requires OAuth setup. The desktop app has no such restrictions from the user's perspective. With the Slack skill from agent-browser, you can automate:

Sending scheduled messages or reports to specific channels
Monitoring channels for keywords and triggering downstream actions
Cross-posting content between channels based on rules
Extracting thread summaries for async teams

Notion and Obsidian Knowledge Management

Notion's Electron client and Obsidian (also Electron-based) can be driven through CDP to automate note creation, database updates, and content organization without relying on Notion's API quota limits or Obsidian's plugin ecosystem.

Key Technical Advantages Over Traditional API Approaches

No API required. Many Electron applications either have no public API, have a restricted or expensive API, or gate automation behind developer accounts. CDP bypasses this entirely — if a human can use the app, your agent can too.

Session persistence. You operate inside the user's active, authenticated session. There is no token management, no OAuth dance, no re-authentication. The app thinks a human is using it.

Full UI control. You are not constrained by what an API exposes. If a button exists in the UI, you can click it. If a field exists, you can fill it. The automation surface is the entire application.

AI-readable structure. Unlike raw screenshot-based automation (which requires vision models to interpret pixel positions), agent-browser's snapshot command returns structured element data. This is faster, more reliable, and works cleanly with language models that reason over structured text.

Getting Started

Install agent-browser from agent-browser.dev or clone the Vercel Labs repository.
Launch your target Electron application with --remote-debugging-port=<port>.
Run agent-browser connect <port> to establish a CDP session.
Use agent-browser snapshot -i to inspect available UI elements.
Build your automation sequence using click, fill, type, and screenshot commands.
For Claude Code integration, load the relevant skill file from agent-browser/skills/ into your agent context.

Conclusion

The insight that Electron applications are just Chrome in disguise is not new, but pairing it with an AI-native automation tool like agent-browser creates something genuinely useful: a general-purpose automation layer for virtually every popular desktop productivity application, with no API contracts required.

For developers building AI agents, support automation systems, or internal tooling, this approach deserves serious attention. The combination of CDP's maturity, Electron's ubiquity, and agent-browser's structured UI interface represents one of the cleaner paths to desktop automation available today — and it works with the applications your users are already running.

Electron App Automation with agent-browser and CDP

Automate Any Electron App with agent-browser and CDP: A Developer's Guide

Why Electron Apps Are Secretly Just Chrome

agent-browser: AI-Native Electron Automation

Official Skills from Vercel Labs

Real-World Automation Use Cases

Customer Support Automation via WhatsApp Desktop

VS Code Workflow Automation

Slack Monitoring and Messaging

Notion and Obsidian Knowledge Management

Key Technical Advantages Over Traditional API Approaches

Getting Started

Conclusion

Keep this session moving with the OpenClaw Nodes & Automation hub

Send this page to someone who needs it

Tags

Related Skills

Agent-Browser: Electron App Automation via CDP

agent-browser: Chrome DevTools Protocol Automation

agent-browser

Related Articles

agent-browser: Electron App Automation via CDP

UI-TARS-Desktop: Local Desktop Automation Agent

Browser Automation for Social Media Publishing