Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,15 @@
"pages": [
"docs/use-cases/coding-agents",
"docs/use-cases/computer-use",
"docs/use-cases/ci-cd"
"docs/use-cases/ci-cd",
{
"group": "Browser use",
"icon": "globe",
"pages": [
"docs/use-cases/browser-use",
"docs/use-cases/agent-browser"
]
}
]
},
{
Expand Down
235 changes: 235 additions & 0 deletions docs/use-cases/agent-browser.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
---
title: "Agent remote browser"
description: "Run an autonomous AI agent inside an E2B sandbox that browses the web using a Kernel cloud browser and the Browser Use framework."
icon: "robot"
---

Run an AI agent inside an E2B sandbox that autonomously controls a [Kernel](https://www.kernel.computer/) cloud browser. The agent decides what to click, type, and navigate — you just give it a task.

This builds on the [remote browser](/docs/use-cases/browser-use) pattern by adding the [Browser Use](https://docs.browser-use.com/) framework, which turns an LLM into a browser-controlling agent.

## Architecture

1. **E2B Sandbox** — isolated environment where the agent code runs. Pre-installed with Kernel SDK, Playwright, and Browser Use.
2. **Kernel Cloud Browser** — remote Chromium instance the agent controls via CDP.
3. **Browser Use** — agent framework that connects an LLM to Playwright. The LLM sees screenshots and decides actions (click, type, scroll, navigate).

The orchestrator creates the sandbox and kicks off the agent. The agent runs autonomously inside the sandbox — it creates a Kernel browser, connects Browser Use, and executes the task.

## Prerequisites

- An [E2B API key](https://e2b.dev/dashboard?tab=keys)
- A [Kernel API key](https://www.kernel.computer/)
- An LLM API key (Anthropic, OpenAI, or other [supported model](https://docs.browser-use.com/customize/supported-models))
- Python 3.10+

```bash
pip install e2b-code-interpreter
```

Set your keys in the environment:

```bash .env
E2B_API_KEY=e2b_***
KERNEL_API_KEY=kernel_***
ANTHROPIC_API_KEY=sk-ant-***
```

## How it works

<Steps>
<Step title="Create the sandbox">
Start an E2B sandbox using the `kernel-agent-browser` template, which comes with Kernel SDK, Playwright, and Browser Use pre-installed. Pass the API keys the agent will need.

```python
from e2b_code_interpreter import Sandbox

sandbox = Sandbox.create(
"kernel-agent-browser",
envs={
"KERNEL_API_KEY": os.environ["KERNEL_API_KEY"],
"ANTHROPIC_API_KEY": os.environ["ANTHROPIC_API_KEY"],
},
timeout=300,
)
```
</Step>

<Step title="Write the agent script">
The agent script creates a Kernel browser, connects Browser Use to it, and runs a task autonomously.

```python
AGENT_SCRIPT = '''
import asyncio
from kernel import Kernel
from browser_use import Agent, Browser, ChatAnthropic

async def main():
kernel = Kernel()
kb = kernel.browsers.create()

browser = Browser(cdp_url=kb.cdp_ws_url)

agent = Agent(
task="Go to Hacker News, find the top 3 AI stories, and summarize them",
llm=ChatAnthropic(model="claude-sonnet-4-20250514"),
browser=browser,
)
result = await agent.run()
print(result)

asyncio.run(main())
'''

sandbox.files.write("/home/user/agent_task.py", AGENT_SCRIPT)
```
</Step>

<Step title="Run the agent">
Execute the agent inside the sandbox. The agent will autonomously browse, click, type, and navigate to complete the task.

```python
result = sandbox.commands.run(
"python3 /home/user/agent_task.py",
timeout=180,
)
print(result.stdout)
```
</Step>
</Steps>

## Full example

```python agent_browser.py expandable
"""
Agent Remote Browser — E2B + Kernel + Browser Use

Spins up an E2B sandbox with Browser Use framework and Kernel cloud browser.
An AI agent autonomously browses the web to complete a research task.
"""

import os

from e2b_code_interpreter import Sandbox

AGENT_SCRIPT = '''
import asyncio
from kernel import Kernel
from browser_use import Agent, Browser, ChatAnthropic

async def main():
# Create a Kernel cloud browser
kernel = Kernel()
kb = kernel.browsers.create()
print(f"Kernel browser created: {kb.id}")

# Connect Browser Use to the Kernel browser via CDP
browser = Browser(cdp_url=kb.cdp_ws_url)

# Create an AI agent that autonomously browses
agent = Agent(
task="""
Go to https://news.ycombinator.com and find the top 3 stories
that are about AI or machine learning. For each story:
1. Note the title and point count
2. Click through to the comments page
3. Read the top comment

Return a summary of your findings.
""",
llm=ChatAnthropic(model="claude-sonnet-4-20250514"),
browser=browser,
max_actions_per_step=4,
)

result = await agent.run()
print("\\n" + "=" * 60)
print("AGENT RESULT:")
print("=" * 60)
print(result)

asyncio.run(main())
'''


def main():
sandbox = Sandbox.create(
"kernel-agent-browser",
envs={
"KERNEL_API_KEY": os.environ["KERNEL_API_KEY"],
"ANTHROPIC_API_KEY": os.environ["ANTHROPIC_API_KEY"],
},
timeout=300,
)

try:
sandbox.files.write("/home/user/agent_task.py", AGENT_SCRIPT)

result = sandbox.commands.run(
"python3 /home/user/agent_task.py",
timeout=180,
)

if result.exit_code != 0:
print(f"Agent failed: {result.stderr}")
else:
print(result.stdout)

finally:
sandbox.kill()


if __name__ == "__main__":
main()
```

## Key concepts

| Concept | Detail |
|---|---|
| **E2B template** | `kernel-agent-browser` — pre-built with Kernel SDK, Playwright, and Browser Use |
| **Kernel browser** | `kernel.browsers.create()` spins up a remote Chromium; connect via `kb.cdp_ws_url` |
| **Browser Use** | `Browser(cdp_url=...)` connects the agent framework to Kernel's CDP endpoint |
| **LLM choice** | Browser Use supports `ChatAnthropic`, `ChatOpenAI`, `ChatGoogle`, and more |
| **Autonomous agent** | The LLM sees the page (via screenshots) and decides what actions to take |

## Choosing an LLM

Browser Use supports multiple LLM providers. Import the one you need:

```python
# Anthropic (recommended)
from browser_use import ChatAnthropic
llm = ChatAnthropic(model="claude-sonnet-4-20250514")

# OpenAI
from browser_use import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

# Google
from browser_use import ChatGoogle
llm = ChatGoogle(model="gemini-2.5-flash")
```

Pass the corresponding API key in the sandbox `envs`.

## Adapting this example

- **Different tasks** — change the `task` string to any web research, form filling, or data extraction task.
- **Custom actions** — Browser Use supports [custom actions](https://docs.browser-use.com/customize/custom-actions) to extend agent capabilities.
- **Vision control** — set `use_vision="auto"` on the Agent to let it decide when to use screenshots vs DOM.
- **Multiple agents** — run several agents in parallel, each with their own Kernel browser, for concurrent research.

## Related guides

<CardGroup cols={3}>
<Card title="Remote browser" icon="globe" href="/docs/use-cases/browser-use">
Programmatic browser automation with Playwright + Kernel
</Card>
<Card title="Computer use" icon="desktop" href="/docs/use-cases/computer-use">
Build AI agents that control virtual desktops
</Card>
<Card title="Sandbox lifecycle" icon="rotate" href="/docs/sandbox">
Create, manage, and control sandbox lifecycle
</Card>
</CardGroup>
Loading
Loading