Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6ff2255
feat: implementing the first tool
daveomri Mar 18, 2026
5f01fa3
chore: renamed platform integration tag
daveomri Mar 18, 2026
c685127
feat: updading apify tool docs
daveomri Mar 19, 2026
4deefa8
feat: update Apify tool dependencies and enhance documentation
daveomri Mar 20, 2026
dd2d6fd
feat: add task execution tools to Apify integration and create unit t…
daveomri Mar 20, 2026
f823eae
feat: edit docs for apify tools
daveomri Mar 20, 2026
bcf0950
feat: enhance Apify tool with validation methods and default parameters
daveomri Mar 23, 2026
d15a337
feat: create validation tests
daveomri Mar 23, 2026
2d8bfbe
feat: standardize terminology in apify tool documentation and code
daveomri Mar 24, 2026
1ef943d
feat: refactor Apify tools into core module and update docs
daveomri Mar 25, 2026
ff2494e
feat: refactor the tool, use one file only
daveomri Mar 25, 2026
488a168
docs: add missing tools parameters
daveomri Mar 26, 2026
1b9675f
fix: Update Apify tools documentation for improved clarity and expand…
jirispilka Apr 2, 2026
46daa97
docs: keep most important tools in readme
daveomri Apr 2, 2026
19500c7
feat: update crawler type constants in Apify tool
daveomri Apr 2, 2026
4405ebe
feat: use Literal for crawler types in Apify tool
daveomri Apr 2, 2026
ab930ad
feat: add comment for tracking header
daveomri Apr 2, 2026
b07d7c1
feat: add error handling for missing actor run data and dataset in Ap…
daveomri Apr 2, 2026
30412f7
feat: add unit tests for new tools guarding
daveomri Apr 2, 2026
4732185
fix: ensure explicit empty input is correctly passed to Apify actor
daveomri Apr 2, 2026
b1a792c
fix: add error status message for None
daveomri Apr 2, 2026
07799a1
fix: Improve docs using apify-writing-style
jirispilka Apr 8, 2026
42ed0aa
Merge branch 'feat/strands-core-apify-tools' into feat/strands-core-a…
jirispilka Apr 8, 2026
81810c4
fix: Improve docs using apify-writing-style
jirispilka Apr 8, 2026
2eab80c
fix: Improve docs using apify-writing-style
jirispilka Apr 14, 2026
38ccda7
Merge pull request #4 from jirispilka/feat/strands-core-apify-tools
daveomri Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ Below is a comprehensive table of all available tools, how to use them with an a
| Tool | Agent Usage | Use Case |
|------|-------------|----------|
| a2a_client | `provider = A2AClientToolProvider(known_agent_urls=["http://localhost:9000"]); agent = Agent(tools=provider.tools)` | Discover and communicate with A2A-compliant agents, send messages between agents |
| apify_run_actor | `agent.tool.apify_run_actor(actor_id="apify/website-content-crawler", run_input={"startUrls": [{"url": "https://example.com"}]})` | Run any Apify Actor with arbitrary input |
| apify_scrape_url | `agent.tool.apify_scrape_url(url="https://example.com")` | Scrape a URL and return its content as markdown |
| file_read | `agent.tool.file_read(path="path/to/file.txt")` | Reading configuration files, parsing code files, loading datasets |
| file_write | `agent.tool.file_write(path="path/to/file.txt", content="file content")` | Writing results to files, creating new files, saving output data |
| editor | `agent.tool.editor(command="view", path="path/to/file.py")` | Advanced file operations like syntax highlighting, pattern replacement, and multi-file edits |
Expand Down Expand Up @@ -960,6 +962,47 @@ result = agent.tool.mongodb_memory(
)
```

### Apify

```python
from strands import Agent
from strands_tools.apify import APIFY_CORE_TOOLS

agent = Agent(tools=APIFY_CORE_TOOLS)

# Scrape a single URL and get Markdown content
content = agent.tool.apify_scrape_url(url="https://example.com")

# Run an Actor and get results in one step
result = agent.tool.apify_run_actor_and_get_dataset(
actor_id="apify/website-content-crawler",
run_input={"startUrls": [{"url": "https://example.com"}]},
dataset_items_limit=50,
)

# Run a saved task (pre-configured Actor with default inputs)
run_info = agent.tool.apify_run_task(task_id="user/my-task")

# Run a task and get results in one step
result = agent.tool.apify_run_task_and_get_dataset(
task_id="user/my-task",
task_input={"query": "override default input"},
dataset_items_limit=50,
)

# Run an Actor (get metadata only)
run_info = agent.tool.apify_run_actor(
actor_id="apify/google-search-scraper",
run_input={"queries": "AI agent frameworks"},
)

# Fetch dataset items separately
items = agent.tool.apify_get_dataset_items(
dataset_id="abc123",
limit=100,
)
```

## 🌍 Environment Variables Configuration

Agents Tools provides extensive customization through environment variables. This allows you to configure tool behavior without modifying code, making it ideal for different environments (development, testing, production).
Expand Down Expand Up @@ -1068,6 +1111,12 @@ The Mem0 Memory Tool supports three different backend configurations:
- If `NEPTUNE_ANALYTICS_GRAPH_IDENTIFIER` is set, the tool will configure Neptune Analytics as graph store to enhance memory search
- LLM configuration applies to all backend modes and allows customization of the language model used for memory processing

#### Apify Tool

| Environment Variable | Description | Default |
|----------------------|-------------|---------|
| APIFY_API_TOKEN | Apify API token for authentication (required) | None |

#### Bright Data Tool

| Environment Variable | Description | Default |
Expand Down
205 changes: 205 additions & 0 deletions docs/apify_tool.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Apify

The Apify tools (`apify.py`) enable [Strands Agents](https://strandsagents.com/) to interact with the [Apify](https://apify.com) platform — running any [Actor](https://apify.com/store) or [task](https://docs.apify.com/platform/actors/running/tasks) by ID, fetching dataset results, and scraping individual URLs.

## Installation

```bash
pip install strands-agents-tools[apify]
```

## Configuration

Set your Apify API token as an environment variable:

```bash
export APIFY_API_TOKEN=apify_api_your_token_here
```

Get your token from [Apify Console](https://console.apify.com/account/integrations) → Settings → API & Integrations → Personal API tokens.

## Usage

Register all core tools at once:

```python
from strands import Agent
from strands_tools.apify import APIFY_CORE_TOOLS

agent = Agent(tools=APIFY_CORE_TOOLS)
```

Or pick individual tools:

```python
from strands import Agent
from strands_tools import apify

agent = Agent(tools=[
apify.apify_run_actor,
apify.apify_scrape_url,
])
```

### Scrape a URL

The simplest way to extract content from any web page. Uses the [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor under the hood and returns the page content as Markdown:

```python
content = agent.tool.apify_scrape_url(url="https://example.com")
```

### Run an Actor

Execute any Actor from [Apify Store](https://apify.com/store) by its ID. The call blocks until the Actor run finishes or the timeout is reached:

```python
result = agent.tool.apify_run_actor(
actor_id="apify/website-content-crawler",
run_input={"startUrls": [{"url": "https://example.com"}]},
timeout_secs=300,
)
```

The result is a JSON string containing run metadata: `run_id`, `status`, `dataset_id`, `started_at`, and `finished_at`.

### Run an Actor and Get Results

Combine running an Actor and fetching its dataset results in a single call:

```python
result = agent.tool.apify_run_actor_and_get_dataset(
actor_id="apify/website-content-crawler",
run_input={"startUrls": [{"url": "https://example.com"}]},
dataset_items_limit=50,
)
```

### Run a task

Execute a saved [Actor task](https://docs.apify.com/platform/actors/running/tasks) — a pre-configured Actor with preset inputs. Use this when a task has already been set up in Apify Console:

```python
result = agent.tool.apify_run_task(
task_id="user~my-task",
task_input={"query": "override input"},
timeout_secs=300,
)
```

The result is a JSON string containing run metadata: `run_id`, `status`, `dataset_id`, `started_at`, and `finished_at`.

### Run a task and get results

Combine running a task and fetching its dataset results in a single call:

```python
result = agent.tool.apify_run_task_and_get_dataset(
task_id="user~my-task",
dataset_items_limit=50,
)
```

### Fetch dataset items

Retrieve results from a dataset by its ID. Useful after running an Actor to get the structured results separately, or to access any existing dataset:

```python
items = agent.tool.apify_get_dataset_items(
dataset_id="abc123",
limit=100,
offset=0,
)
```

## Tool Parameters

### apify_scrape_url

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `url` | string | Yes | — | The URL to scrape |
| `timeout_secs` | int | No | 120 | Maximum time in seconds to wait for scraping to finish |
| `crawler_type` | string | No | `"cheerio"` | Crawler engine to use. One of `"cheerio"` (fastest, no JS rendering), `"playwright:adaptive"` (fast, renders JS if present), or `"playwright:firefox"` (reliable, renders JS, best at avoiding blocking but slower) |

**Returns:** Markdown content of the scraped page as a plain string.

### apify_run_actor

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `actor_id` | string | Yes | — | Actor identifier (e.g., `apify/website-content-crawler`) |
| `run_input` | dict | No | None | JSON-serializable input for the Actor |
| `timeout_secs` | int | No | 300 | Maximum time in seconds to wait for the Actor run to finish |
| `memory_mbytes` | int | No | None | Memory allocation in MB for the Actor run (uses Actor default if not set) |
| `build` | string | No | None | Actor build tag or number to run a specific version (uses latest build if not set) |

**Returns:** JSON string with run metadata: `run_id`, `status`, `dataset_id`, `started_at`, `finished_at`.

### apify_run_task

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `task_id` | string | Yes | — | Task identifier (e.g., `user~my-task` or a task ID) |
| `task_input` | dict | No | None | JSON-serializable input to override the task's default input |
| `timeout_secs` | int | No | 300 | Maximum time in seconds to wait for the task run to finish |
| `memory_mbytes` | int | No | None | Memory allocation in MB for the task run (uses task default if not set) |

**Returns:** JSON string with run metadata: `run_id`, `status`, `dataset_id`, `started_at`, `finished_at`.

### apify_run_task_and_get_dataset

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `task_id` | string | Yes | — | Task identifier (e.g., `user~my-task` or a task ID) |
| `task_input` | dict | No | None | JSON-serializable input to override the task's default input |
| `timeout_secs` | int | No | 300 | Maximum time in seconds to wait for the task run to finish |
| `memory_mbytes` | int | No | None | Memory allocation in MB for the task run (uses task default if not set) |
| `dataset_items_limit` | int | No | 100 | Maximum number of dataset items to return |
| `dataset_items_offset` | int | No | 0 | Number of dataset items to skip for pagination |

**Returns:** JSON string with run metadata plus an `items` array containing the dataset results.

### apify_get_dataset_items

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `dataset_id` | string | Yes | — | The Apify dataset ID to fetch items from |
| `limit` | int | No | 100 | Maximum number of items to return |
| `offset` | int | No | 0 | Number of items to skip for pagination |

**Returns:** JSON string containing an array of dataset items.

### apify_run_actor_and_get_dataset

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `actor_id` | string | Yes | — | Actor identifier (e.g., `apify/website-content-crawler`) |
| `run_input` | dict | No | None | JSON-serializable input for the Actor |
| `timeout_secs` | int | No | 300 | Maximum time in seconds to wait for the Actor run to finish |
| `memory_mbytes` | int | No | None | Memory allocation in MB for the Actor run (uses Actor default if not set) |
| `build` | string | No | None | Actor build tag or number to run a specific version (uses latest build if not set) |
| `dataset_items_limit` | int | No | 100 | Maximum number of dataset items to return |
| `dataset_items_offset` | int | No | 0 | Number of dataset items to skip for pagination |

**Returns:** JSON string with run metadata plus an `items` array containing the dataset results.

## Troubleshooting

| Error | Cause | Fix |
|-------|-------|-----|
| `APIFY_API_TOKEN environment variable is not set` | Token not configured | Set the `APIFY_API_TOKEN` environment variable |
| `apify-client package is required` | Optional dependency not installed | Run `pip install strands-agents-tools[apify]` |
| `Actor ... finished with status FAILED` | Actor execution error | Check Actor input parameters and run logs in [Apify Console](https://console.apify.com) |
| `Task ... finished with status FAILED` | Task execution error | Check task configuration and run logs in [Apify Console](https://console.apify.com) |
| `Actor/task ... finished with status TIMED-OUT` | Timeout too short for the workload | Increase the `timeout_secs` parameter |
| `Task ... returned no run data` | Task `call()` returned `None` (wait timeout) | Increase the `timeout_secs` parameter |
| `No content returned for URL` | Website Content Crawler returned empty results | Verify the URL is accessible and returns content |

## References

- [Strands Agents Tools](https://strandsagents.com/latest/user-guide/concepts/tools/tools_overview/)
- [Apify Platform](https://apify.com)
- [Apify API Documentation](https://docs.apify.com/api/v2)
- [Apify Store](https://apify.com/store)
- [Apify Python Client](https://docs.apify.com/api/client/python/docs)
7 changes: 5 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ Homepage = "https://github.com/strands-agents/tools"
Documentation = "https://strandsagents.com/"

[project.optional-dependencies]
apify = [
"apify-client>=2.5.0,<3.0.0",
]
build = [
"hatch>=1.16.5",
]
Expand Down Expand Up @@ -122,7 +125,7 @@ mongodb-memory = [
]

[tool.hatch.envs.hatch-static-analysis]
features = ["mem0-memory", "local-chromium-browser", "agent-core-browser", "agent-core-code-interpreter", "a2a-client", "diagram", "rss", "use-computer", "twelvelabs", "elasticsearch-memory", "mongodb-memory"]
features = ["mem0-memory", "local-chromium-browser", "agent-core-browser", "agent-core-code-interpreter", "a2a-client", "diagram", "rss", "use-computer", "twelvelabs", "elasticsearch-memory", "mongodb-memory", "apify"]
dependencies = [
"strands-agents>=1.0.0",
"mypy>=0.981,<1.0.0",
Expand All @@ -141,7 +144,7 @@ lint-check = [
lint-fix = ["ruff check --fix"]

[tool.hatch.envs.hatch-test]
features = ["mem0-memory", "local-chromium-browser", "agent-core-browser", "agent-core-code-interpreter", "a2a-client", "diagram", "rss", "use-computer", "twelvelabs", "elasticsearch-memory", "mongodb-memory"]
features = ["mem0-memory", "local-chromium-browser", "agent-core-browser", "agent-core-code-interpreter", "a2a-client", "diagram", "rss", "use-computer", "twelvelabs", "elasticsearch-memory", "mongodb-memory", "apify"]
extra-dependencies = [
"moto>=5.1.0,<6.0.0",
"pytest>=8.0.0,<10.0.0",
Expand Down
Loading
Loading