| title | API Debug Env |
|---|---|
| emoji | 🔧 |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| pinned | false |
A real-world OpenEnv environment where AI agents learn to debug broken HTTP API requests.
Built for the Scaler × Meta PyTorch × HuggingFace OpenEnv Hackathon 2026.
An AI agent is given a broken HTTP request and a task description. It interacts with a live mock REST API running inside the same container, reads the real HTTP error responses (401, 422, 429, 500), and iteratively fixes its requests — adjusting headers, methods, body types, and authentication flows — until it receives a successful 200 response.
Every step produces a real HTTP call. The grader is fully deterministic: no LLM, no fuzzy matching. The reward is derived entirely from HTTP status codes and response schema matching.
| Property | Value |
|---|---|
| Framework | OpenEnv (openenv-core 0.2.3) |
| Tasks | 3 difficulty levels — 9 tasks total |
| Max Steps per Episode | 5 |
| Reward Range | 0.0 – 1.0 |
| Grader | Fully deterministic (no LLM) |
| Mock API | Internal FastAPI router, same container |
| Port | 7860 (HF Spaces) |
The agent sends a structured HTTP request at each step.
| Field | Type | Description |
|---|---|---|
method |
str |
HTTP method — GET, POST, PUT, DELETE, PATCH |
url |
str |
Endpoint path e.g. /mock_api/users |
headers |
dict |
Request headers e.g. {"Authorization": "Bearer token"} |
body |
dict |
Request body — null for GET requests |
query_params |
dict |
URL query parameters e.g. {"q": "python"} |
After each step the agent receives:
| Field | Type | Description |
|---|---|---|
task_id |
str |
Current task — easy, medium, or hard |
task_description |
str |
Plain-English description of what needs to be fixed |
broken_request |
dict |
The original broken request shown at episode start |
last_status_code |
int |
HTTP status from last step (0 = not yet tried) |
last_response_headers |
dict |
Response headers from last step |
last_response_body |
str |
Raw response body from last step |
step_feedback |
str |
Human-readable hint based on last error |
current_score |
float |
Running reward 0.0–1.0 |
reward |
float |
Reward for the last step |
done |
bool |
Whether the episode has ended |
attempt |
int |
Current step number |
Simple fixes to authentication headers and request format.
| Task ID | Bug | What the Agent Needs to Fix |
|---|---|---|
easy_auth |
Missing Authorization header |
Add Authorization: Bearer demo_token_123 to GET /mock_api/users |
easy_content_type |
Wrong Content-Type: text/plain |
Change to application/json and add {"name": "book"} body to POST /mock_api/items |
easy_query_param |
Missing ?q= query param |
Add q=python to GET /mock_api/search |
Requires understanding API semantics and request structure.
| Task ID | Bug | What the Agent Needs to Fix |
|---|---|---|
medium_wrong_method |
GET instead of POST |
Change method to POST on /mock_api/orders |
medium_type_mismatch |
product_id: "five" (string) |
Fix to integer 5 in body |
medium_nested_field |
address: "123 Main St" (string) |
Fix to dict {"street": "123 Main St", "city": "NY"} |
Requires multi-step reasoning and handling stateful API behaviour.
| Task ID | Bug | What the Agent Needs to Fix |
|---|---|---|
hard_token_exchange |
Stale/expired Bearer token | First POST to /mock_api/auth/token to get a fresh token, then use it on GET /mock_api/protected |
hard_rate_limit |
No backoff after 429 | After receiving 429, include X-Retry-After: 2 header on next request |
hard_pagination |
Never follows cursor | Call GET /mock_api/logs?cursor=<value> until has_more is false |
reward = schema_match_score × attempt_bonus
attempt_bonus = max(0.0, 1.0 − (attempt / max_steps) × 0.15)
Agents are rewarded more for solving tasks in fewer steps. Partial credit is given at every step:
| HTTP Status Received | Partial Reward | Meaning |
|---|---|---|
| 0 | 0.00 | Request never sent / connection error |
| 401 / 403 / 404 | 0.05 | Hit the endpoint, auth failed |
| 405 / 415 | 0.10 | Auth ok, wrong method or content-type |
| 422 | 0.15 | Auth ok, body validation failed |
| 429 | 0.20 | Hit endpoint, needs rate limit handling |
| 200 (schema mismatch) | 0.70 | Right status, wrong response structure |
| 200 (schema match) | 0.85–1.00 | Correct — higher reward for fewer attempts |
All endpoints are mounted at /mock_api/ inside the same container — no external calls, fully reproducible.
| Method | Path | What it Does | Common Error |
|---|---|---|---|
GET |
/mock_api/users |
Returns user list | 401 if no Bearer token |
POST |
/mock_api/items |
Creates an item | 415 if wrong Content-Type, 422 if no name |
GET |
/mock_api/search |
Search results | 422 if no ?q= param |
POST |
/mock_api/orders |
Creates an order | 405 if GET, 422 if wrong types |
POST |
/mock_api/profile |
Updates profile | 422 if address is not a dict |
POST |
/mock_api/auth/token |
Issues a token | 401 if wrong credentials |
GET |
/mock_api/protected |
Protected resource | 401 if token not from /auth/token |
GET |
/mock_api/rate_limited |
Rate-limited resource | 429 after 3 requests without X-Retry-After |
GET |
/mock_api/logs |
Paginated log entries | Returns cursor — must follow chain |
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check — returns {"status": "ok"} |
POST |
/reset |
Start new episode. Body: {"task_id": "easy"} |
POST |
/step |
Execute action. Body: {"action": {...}} |
GET |
/state |
Get current episode state |
GET |
/docs |
Auto-generated Swagger UI |
from client import APIDebugEnv
from models import APIAction
with APIDebugEnv(base_url="https://ProthamD-api-debug-env.hf.space").sync() as env:
# Start an episode
obs = env.reset(task_id="easy")
print(obs.task_description)
print(obs.broken_request)
# Fix the request and step
result = env.step(APIAction(
method="GET",
url="/mock_api/users",
headers={"Authorization": "Bearer demo_token_123"},
body=None,
query_params={}
))
print(result.reward) # 0.85+
print(result.observation.step_feedback)pip install git+https://huggingface.co/spaces/ProthamD/api-debug-envdocker pull registry.hf.space/ProthamD-api-debug-env:latest
docker run -p 7860:7860 registry.hf.space/ProthamD-api-debug-env:latest# Set environment variables
export API_BASE_URL=https://router.huggingface.co/v1
export MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.3
export HF_TOKEN=hf_your_token_here
export ENV_URL=https://ProthamD-api-debug-env.hf.space
# Run
python inference.py[START] task=easy env=api_debug_env model=mistralai/Mistral-7B-Instruct-v0.3
[STEP] step=1 action={"method":"GET","url":"/mock_api/users",...} reward=0.05 done=false error=null
[STEP] step=2 action={"method":"GET","url":"/mock_api/users","headers":{"Authorization":"Bearer demo_token_123"},...} reward=0.92 done=true error=null
[END] success=true steps=2 score=0.920 rewards=0.05,0.92api_debug_env/
├── inference.py ← Baseline inference script (root, mandatory)
├── models.py ← APIAction, APIObservation, APIState
├── client.py ← APIDebugEnv(EnvClient)
├── openenv.yaml ← Environment manifest
├── pyproject.toml ← Package config
├── Dockerfile ← HF Spaces Dockerfile (port 7860)
├── tasks/
│ ├── easy.py ← 3 easy tasks
│ ├── medium.py ← 3 medium tasks
│ ├── hard.py ← 3 hard tasks
│ └── registry.py ← TASK_REGISTRY dict
├── graders/
│ └── grader.py ← Deterministic reward logic
└── server/
├── app.py ← FastAPI app with create_app()
├── api_debug_environment.py ← Environment logic
├── mock_api.py ← Internal mock REST API router
└── requirements.txt
git clone https://huggingface.co/spaces/ProthamD/api-debug-env
cd api-debug-env
pip install openenv-core fastapi uvicorn httpx pydantic openai python-dotenv
# Windows
$env:PYTHONPATH = "path\to\api-debug-env"
# Linux/Mac
export PYTHONPATH=$(pwd)
uvicorn server.app:app --host 0.0.0.0 --port 7860Test it:
curl http://localhost:7860/health
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "easy"}'| Variable | Required | Description |
|---|---|---|
API_BASE_URL |
Yes | LLM API endpoint e.g. https://router.huggingface.co/v1 |
MODEL_NAME |
Yes | Model identifier e.g. mistralai/Mistral-7B-Instruct-v0.3 |
HF_TOKEN |
Yes | HuggingFace token with inference access |
ENV_URL |
No | Override environment URL (default: localhost) |
Debugging broken HTTP requests is one of the most common real-world developer tasks. Every backend developer, DevOps engineer, and API integrator does this daily. Unlike existing OpenEnv environments (games, code execution, financial simulations), there was no environment for this domain.
Key advantages of this domain for RL training:
- Deterministic grading — HTTP status codes are binary, no LLM judge needed
- Rich partial reward signal — agent gets meaningful feedback at every step
- Stateful multi-turn reasoning — hard tasks require chaining multiple requests
- Real-world transferability — skills learned here apply directly to production debugging
MIT
Pratham Dey (ProthamD) — IIEST Shibpur, Information Technology