Skip to content

6 bugs surfaced by running agents-cli in a non-interactive CI sandbox (consolidated) #18

@ivanmkc

Description

@ivanmkc

Surfacing 5 bugs we hit while running agents-cli inside the benchmark suite at ivanmkc/agent-generator (full triage at issue #574). Each block has the verbatim error / observed behaviour, the case that hit it, and a suggested fix direction.

Filing as one issue because they share an environment (CI / sandboxed CLI invocation) and a team. Happy to split into individual issues if that's easier to triage.


1. agents-cli deploy doesn't auto-detect GCP project from ADC

Verbatim error (captured in claude_opus:CLI-DEV-INCIDENT-RESPONSE-001-EXP transcript, event 177):

Error: Could not determine GCP project. Pass --project <PROJECT_ID> or set a default: gcloud config set project <PROJECT_ID>

Setup: ADC was valid (gcloud auth application-default print-access-token succeeded), GOOGLE_CLOUD_PROJECT env var was set, but agents-cli deploy failed without an explicit --project flag.

Suggested fix: fall back to google.auth.default() to read the project from ADC (or os.environ['GOOGLE_CLOUD_PROJECT']) when --project is missing.


2. agents-cli infra cicd doesn't auto-detect $GH_TOKEN from env

Observed: when running agents-cli infra cicd without --github-pat, the agent prompts for input. Even with GH_TOKEN (and GITHUB_TOKEN) set in the environment and gh auth status succeeding, the CLI doesn't pick those up.

Setup: cases that drive infra cicd from a non-interactive runner can't satisfy the prompt. We worked around this in benchmark prompts by passing --github-pat "$GH_TOKEN" --github-app-installation-id "$GITHUB_APP_INSTALLATION_ID" explicitly, but auto-detection would be more ergonomic.

Suggested fix: when --github-pat is not supplied, read GH_TOKEN then GITHUB_TOKEN from env (matching the gh CLI's own precedence).


3. agents-cli deploy to agent_runtime doesn't poll long-running ops

Observed (gemini_pro:CLI-DEV-SELF-TUNING-SUPPORT-001-EXP, event 144):

agents-cli deploy --project adk-coding-agents --region global
# returns 0 after a few seconds

Workspace deployment_metadata.json after the call:

{
  "remote_agent_runtime_id": "None",
  "pending_operation": {
    "operation_name": "projects/.../reasoningEngines/.../operations/...",
    "started_at": "..."
  }
}

The deploy started the Vertex op but didn't wait for it to complete. remote_agent_runtime_id is the string "None" (which downstream code then can't use).

Suggested fix: expose a --wait flag (or make polling-to-completion the default) that polls the operation, blocks until done, and writes the real remote_agent_runtime_id.


4. agents-cli deploy to cloud_run requires gcloud beta but doesn't check

Observed (gemini_pro:CLI-DEV-INCIDENT-RESPONSE-001-EXP, agent reasoning at event 86):

"I'm encountering an issue where the deployment failed because gcloud beta isn't installed or requires sudo."

The deploy succeeded for everyone with gcloud beta pre-installed but silently failed in our sandbox where it wasn't installed by default.

Suggested fix: at the start of deploy --target cloud_run, run gcloud components list --filter='id:beta' and emit an actionable error like "agents-cli deploy --target cloud_run requires gcloud beta. Install via gcloud components install beta."


5. agents-cli eval run records empty actual_tools_called even when tools were called

Observed (both INCIDENT-RESPONSE cases, claude_opus + gemini_pro):

The eval framework records actual_tools_called: [] for all eval cases even though the agent transcripts clearly show the agent calling the relevant tools during the original conversation. Both cases worked around this by removing tool_trajectory_avg_score from eval_config.json to make the eval "pass" — defeating the metric.

agent-project/app/.adk/eval_history/*.json contains the empty arrays.

Suggested fix: trace through where actual_tools_called gets populated in agents-cli eval (might be an adk eval issue upstream). Likely either replay-vs-live mode mismatch or schema-name drift between the evalset format and the runner's expectation.


6. agents-cli data-ingestion: uv venv excludes pip by default, breaks KFP/swifter

Observed (gemini_pro:CLI-DEV-INSTITUTIONAL-MEMORY-001-EXP, events 61-82):

agents-cli data-ingestion triggers a uv venv setup that doesn't include pip. Downstream code (KFP / swifter / dask) tries to import pip and crashes. Agent had to patch pyproject.toml and inject a sitecustomize.py to silence psutil.NoSuchProcess errors.

Suggested fix: either include pip in the uv venv invocation (uv venv --seed or uv venv --python-preference managed --seed), or document the dependency requirement clearly so callers know to opt in.


Context

All 6 bugs surfaced via the agents-cli-golden benchmark suite in ivanmkc/agent-generator, which we use to track agent CLI evaluation. We've worked around each one in case prompts / sandbox config, but the workarounds are brittle and a fix in agents-cli would benefit any non-interactive runner. Happy to share full trace transcripts if useful (each is a gzipped JSON in our viewer).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions