Browser automation, one binary v1.6.1

Scout

The simpler alternative to Playwright. One statically-linked binary — no Node, no Python, no runtime. Drive a real browser from a Go library, a CLI, an MCP server for AI agents, or a chat UI. Same engine. Any caller.

$ brew install klarlabs-studio/tap/scout

See examples GitHub

Why Scout

One binary. Any caller.

Scout is a single statically-linked binary that drives a real Chrome — from a Go program, a shell, an MCP-aware agent, or a chat UI. No runtime to install. No protocol layer between you and the browser.

Direct CDP. Zero abstractions.

Direct WebSocket to Chrome DevTools Protocol — no wrapper layer, no protocol of its own on top. One binary, one connection, every CDP surface available.

Agent-first design.

Structured JSON output, DOM diffing that saves 50-80% tokens, content distillation at 5 levels, token budgets, and semantic form filling. Built for LLMs, not humans.

MCP built in.

Run scout mcp serve and any MCP-aware agent — Claude Desktop, Cursor, Cline, custom — has 74 browser tools. No second project to install.

Gin-like middleware.

Compose retry, timeout, circuit breaker, rate limit, and bulkhead patterns exactly like Gin HTTP middleware. If you know Gin, you know Scout.

Versus Playwright

Same browser. Less ceremony.

Playwright is the gold standard for browser automation — and a 600 MB Node tree, a separate language wrapper, and a separate MCP project to wire any of it to AI. Scout collapses all of that into one binary.

Capability	Scout	Playwright
Install footprint	One ~15 MB binary. `brew install` or download.	npm + ~600 MB browser cache + per-language wrapper.
Runtime dependency	None. Statically linked.	Node.js (always). Python/Java/.NET wrappers as second class.
Languages it drives from	Any. Go library, CLI from any shell, MCP from any LLM host, HTTP-shaped from any agent.	TypeScript / JavaScript first-class. Others lag releases.
AI-agent native	Built-in MCP server. `scout mcp serve`.	Separate `playwright-mcp` project. Extra setup, extra runtime.
Token-aware extraction	DOM diff, content distillation, observation budgets. 50–80% fewer tokens.	Not provided. Page text dumped raw.
Action playbooks	Record & replay deterministic JSON playbooks. No LLM at replay time.	Codegen produces a script you maintain by hand.
Containerised deploy	Drop the binary into `scratch` or `distroless`. Tens of MB.	Carry Node + browser binaries. Hundreds of MB.
Video recording	Built-in. `start_screen_recording` / `stop_screen_recording` — survives navigations and tab switches, encodes to webm/mp4 via ffmpeg or returns frames if ffmpeg is absent. Always a file path, never base64.	`recordVideo` context option in JS/TS only; not exposed via the MCP variant.
CDP access	Direct WebSocket. Zero abstraction layer.	Internal protocol over CDP. Some surfaces hidden.

Code

Four interfaces. One engine.

Embed it in a Go program, run it from any shell, plug it into an AI agent, or talk to it from a chat UI. Same browser session model, four access points.

For Go developers building automation scripts and pipelines

Use the core API when you need full control: named tasks, middleware composition (retry, timeout, circuit breaker, stealth), grouped workflows, and the familiar Gin-style Engine → Context → HandlerFunc pattern. Ideal for scrapers, testing tools, monitoring systems, and CI pipelines.

// Gin-like engine with middleware composition
engine := browse.Default(browse.WithHeadless(true))
engine.MustLaunch()
defer engine.Close()

// Add resilience middleware
engine.Use(middleware.Timeout(30 * time.Second))
engine.Use(middleware.Retry(middleware.RetryConfig{MaxAttempts: 3}))
engine.Use(middleware.Stealth())

// Define and run tasks
engine.Task("scrape-prices", func(c *browse.Context) {
    c.MustNavigate("https://shop.example.com")
    c.El("input[name=q]").MustInput("mechanical keyboard")
    c.El("button[type=submit]").MustClick()

    prices := c.ElAll(".product .price").MustTexts()
    c.Set("prices", prices)
})

engine.Run("scrape-prices")

For AI agent developers who need structured browser interaction

Use the agent API when building autonomous agents that browse the web. Every method returns JSON-serializable structs, content is auto-truncated for LLM context windows, all operations auto-wait, and the session is goroutine-safe. DOM diffing, semantic form filling, annotated screenshots, network capture, and token budgets are built in — your agent thinks less and acts more.

// Session-based API optimized for AI agents
session, _ := agent.NewSession(agent.SessionConfig{Headless: true})
defer session.Close()

// Navigate and observe — structured JSON, not HTML soup
session.Navigate("https://app.example.com")
obs, _ := session.Observe()        // links, inputs, buttons, text

// Semantic form filling — no CSS selectors needed
session.FillFormSemantic(map[string]string{
    "Email":    "user-example",
    "Password": "secret123",
})

// DOM diff — only what changed (saves 50-80% tokens)
session.Click("#login")
_, diff, _ := session.ObserveDiff()
// diff.Added: [{Tag:"div", ID:"dashboard", Text:"Welcome!"}]

// Visual grounding — click by label number, not selector
result, _ := session.AnnotatedScreenshot()
session.ClickLabel(7)  // click element labeled [7]

// Network capture — read API responses directly
session.EnableNetworkCapture("/api/")
captured := session.CapturedRequests("/api/users")

For quick tasks, shell scripts, and CI pipelines

Use the CLI for one-shot operations without writing Go code. Each command launches a browser, navigates, performs one action, outputs the result, and exits. Pipe JSON output to jq, save screenshots in CI, extract data in shell scripts, or quickly inspect a page before writing automation code.

# Navigate and observe
$ scout observe https://example.com
{
  "title": "Example Domain",
  "links": ["More information..."],
  "inputs": [],
  "buttons": []
}

# Get page as markdown
$ scout markdown https://news.ycombinator.com

# Screenshot
$ scout screenshot https://github.com --output gh.png
Saved screenshot to gh.png (284519 bytes)

# Extract data
$ scout extract https://example.com h1
Example Domain

# Discover form fields
$ scout form discover https://login.example.com

# Detect frontend frameworks
$ scout frameworks https://react.dev
react
nextjs

For giving AI assistants (Claude, Cursor, etc.) browser superpowers

Use the MCP server to let your AI assistant browse the web, fill forms, extract data, and take screenshots — all via the standard Model Context Protocol. A single Go binary with 74 tools, zero runtime dependencies. Just install and add one line to your MCP config. The AI decides which tools to call; Scout handles the browser.

# Add to Claude Code
$ claude mcp add scout -- scout mcp serve

# Or configure in claude_desktop_config.json / mcp.json
{
  "mcpServers": {
    "scout": {
      "command": "scout",
      "args": ["mcp", "serve"]
    }
  }
}

# 74 tools available:
#
# Navigation:  navigate, observe, observe_diff, observe_with_budget,
#              hybrid_observe
# Interaction: click, click_label, type, hover, double_click,
#              right_click, select_option, scroll_to, scroll_by,
#              focus, drag_drop, dispatch_event,
#              select_by_prompt, batch, find_by_coordinates
# Forms:       fill_form, fill_form_semantic, discover_form
# Extraction:  extract, extract_all, extract_table,
#              markdown, readable_text, accessibility_tree
# Capture:     screenshot, annotated_screenshot, pdf
# Network:     enable_network_capture, network_requests
# Tabs:        open_tab, switch_tab, close_tab, list_tabs
# Frames:      switch_to_frame, switch_to_main_frame
# Frameworks:  wait_spa, detect_frameworks,
#              component_state, app_state
# Playback:    start_recording, stop_recording,
#              save_playbook, replay_playbook
# Video:       start_screen_recording, stop_screen_recording
# Tracing:     start_trace, stop_trace
# Performance: web_vitals
# Utility:     has_element, wait_for, configure

Token efficiency

Five levels of content distillation.

Pages return megabytes of HTML. Scout gives your agent exactly the right amount of information.

Method	Output size	Best for
`Observe()`	~2-5 KB	Deciding what to click or fill
`ObserveDiff()`	~0.5-2 KB	Seeing only what changed after an action
`Markdown()`	~2-8 KB	Reading page content in a compact format
`ReadableText()`	~1-4 KB	Main article or body text only
`AccessibilityTree()`	~1-4 KB	Compact semantic element tree

Features

Everything, composed.

Organized by layer. Each feature is a building block that composes with the rest.

Core Engine

Pure CDP over WebSocket

Gin-like Engine / Context / Group

Auto-wait on navigation and elements

CSS selectors with chaining

Page pool for concurrent tasks

Remote CDP for cloud browsers

Table extraction to structured data

PDF generation and screenshots

Shadow DOM piercing with flattened cache

Iframe switching for nested content

Agent Package

Structured JSON output

DOM diffing (50-80% token savings)

Token budgets for observation

Semantic form fill by field name

Annotated screenshots with labels

Visual grounding via click_label

Network capture for XHR/fetch

Persistent profiles (cookies + storage)

Screenshot auto-compress for LLMs

5-level content distillation

NL selectors by prompt text

Vision hybrid mode with bounding boxes

Batch operations for multi-action

Trace export to zip files

Web vitals LCP/CLS/INP extraction

click_text visible-text shortcut

Cookie tools list/clear/set for stale-session repair

Network 4xx/5xx in console_errors + failed_requests

Active tab/route in observe

Selector diagnostics with similar candidates

Checkbox/radio in fill_form_semantic with state echo

Middleware

Retry with backoff

Timeout per task

Circuit breaker (fortify)

Rate limit requests

Bulkhead isolation

Stealth mode anti-detection

Auth (Bearer, Basic, Cookie)

Resource blocking (images, fonts)

Viewport and slow motion

Screenshot on error

Stealth v2 canvas/audio/WebRTC noise

UA rotation with 27 realistic agents

Human delay randomized timing

Framework Support

React + Next.js + Gatsby

Vue 2/3 + Nuxt

Angular

Svelte + SvelteKit

Solid + Preact + Lit

Alpine + HTMX + Stimulus

Qwik + Astro + Remix

Ember

Architecture

Layer stack.

Four layers from user-facing interfaces down to the wire protocol.

CLI
scout observe / screenshot / extract

MCP Server
74 tools via stdio

Agent Session
observe / diff / semantic fill / annotate / network capture / profiles

Engine
Task / Context / Group

Middleware
retry / timeout / stealth / auth

Page + Selection
elements / tables / forms

CDP WebSocket
pure Chrome DevTools Protocol -- no rod, no chromedp

Chrome / Chromium
local process or remote via WithRemoteCDP

Built on bolt (structured logging), fortify (resilience patterns), statekit (state machines), and mcp-go (Model Context Protocol).

Supply chain

Vulnerability-driven security.

Every push and PR runs nox — findings flow to GitHub code scanning, gate against a committed baseline, and annotate PRs inline. Dependabot is replaced by nox-remediate: a weekly + on-demand workflow that runs nox fix against fresh OSV.dev findings and opens a single review-ready PR.

Local scan

$ nox scan -severity-threshold high .

Local fix

$ nox fix -input findings.json

Status badge

Auto-refreshed from each main-branch CI run. The README header reflects the current grade.

Get started

Install Scout.

Install the binary, connect to your AI agent, and start browsing. Go library also available.

Homebrew recommended

$ brew install klarlabs-studio/tap/scout

Claude Code MCP

$ claude mcp add scout -- scout mcp serve

Claude Desktop / Cursor MCP

{
  "mcpServers": {
    "scout": {
      "command": "scout",
      "args": ["mcp", "serve"]
    }
  }
}

Binary any platform

$ curl -fsSL https://raw.githubusercontent.com/\
  klarlabs-studio/scout/main/install.sh | bash

Go Library developers

$ go get go.klarlabs.de/scout

Go Install CLI

$ go install go.klarlabs.de/\
  scout/cmd/scout@latest