feat: wire telemetry into all remove.* commands by Hweinstock · Pull Request #1069 · aws/agentcore-cli

Hweinstock · 2026-04-30T21:27:48Z

Description

Wire telemetry into all remove.* commands, following the pattern from #1050 (add commands).

Renamed cliCommandRun → runCliCommand, withAddTelemetry → withCommandRunTelemetry since they are now generalized.
Registered 13 remove commands in COMMAND_SCHEMAS
Wired CLI remove handlers (BasePrimitive + 6 custom primitives) and TUI remove hooks

Not addressed (follow-up): Error classification — remove ops return { success: false, error: string }, losing the original error class. classifyError sees UnknownError instead of a typed error. We'll likely need a larger refactor to how we handle exception flow to a consistent and generalized result type, or conventional error flow.

Related Issue

Closes #1068

Type of Change

New feature

Testing

I ran npm run test:unit and npm run test:integ
I ran npm run typecheck
I ran npm run lint

Added telemetry success + failure assertions for 10 remove commands across 6 integration test files. remove.agent and remove.runtime-endpoint lack coverage (no existing test files exercise removal).

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
My changes generate no new warnings

Hweinstock · 2026-04-30T21:27:54Z

/strands review

Hweinstock · 2026-05-01T15:03:55Z

/strands review

agentcore-cli-automation · 2026-05-01T15:23:33Z

Behavior change in `withCommandRunTelemetry`: thrown errors are now propagated instead of caught

The old withAddTelemetry had a catch-all that converted thrown exceptions from fn() into a { success: false, error: getErrorMessage(err) } result:

} catch (err) {
  if (!result) {
    return { success: false, error: getErrorMessage(err) };
  }
}

The new withCommandRunTelemetry removes that safety net and re-throws instead (if (!result) throw e;). This is a real behavioral change for every TUI call site that was migrated from withAddTelemetry — not just a rename.

Most migrated hooks are fine because they already wrap the call in a try/catch (e.g. useCreateMemory, useCreateEvaluator, useCreateGateway, useCreateOnlineEval, useCreateIdentity). But two call sites only rely on the previous catch-all and now can leak unhandled rejections / freeze the UI if the underlying primitive throws:

src/cli/tui/screens/agent/useAddAgent.ts (lines 149–174) — useAddAgent wraps the call in try/finally, not try/catch. addAgentInner → handleCreatePath / handleByoPath / handleImportPath has no top-level try/catch and calls things that can throw (configIO.readProjectSpec, renderer.render, setupPythonProject, copyFileSync, executeImportAgent, etc.). Previously a throw became { success: false, error }; now it propagates to the caller.

The caller in AddFlow.tsx:
```
void addAgent(config).then(result => { ... });  // no .catch()
```
…has no .catch(), so a throw becomes an unhandled promise rejection and the UI stays stuck on loading: true, loadingMessage: 'Creating agent...'.
src/cli/tui/screens/runtime-endpoint/AddRuntimeEndpointFlow.tsx (lines 81–100) — same pattern:
```
void withCommandRunTelemetry('add.runtime-endpoint', {}, () => runtimeEndpointPrimitive.add(...))
  .then(result => { ... });  // no .catch()
```
If runtimeEndpointPrimitive.add throws, this becomes an unhandled rejection (the old withAddTelemetry would have routed it through setFlow({ name: 'error', ... })).

In practice the individual .add() methods on the primitives each have their own try/catch, so this rarely triggers today — but the previous helper was the backstop for anything the primitive forgot to catch (and addAgentInner in particular does lean on that).

Options

Option A (minimal): Restore the catch-all inside withCommandRunTelemetry so the contract matches the old withAddTelemetry:

} catch (e) {
  if (!result) return { success: false, error: getErrorMessage(e) } as R;
}

This requires the R generic to always be assignable from { success: false; error: string }, which is already the constraint (R extends OperationResult).

Option B: Keep the new re-throw semantics but add try/catch (or .catch(...) on the returned promise) at the two call sites that currently rely on the old behavior (useAddAgent and AddRuntimeEndpointFlow.handleCreateComplete), mapping the error to the appropriate error state.

Option C: Keep re-throw semantics but also wrap addAgentInner in a try/catch internally so it always returns AddAgentInnerResult and never throws — and do the same for the runtime-endpoint add flow.

Option A preserves the old behavior most cheaply; B/C are more explicit but touch more code.

Hweinstock · 2026-05-01T15:31:46Z

format failure is fixed in #1080.

agentcore-cli-automation · 2026-05-01T15:34:46Z

Reviewed the current state of the PR. The two concerns previously raised in comments appear to be addressed:

Unhandled rejections in TUI call sites — the current withCommandRunTelemetry implementation keeps the catch-all that was in the old withAddTelemetry:
```
if (!result) {
  return { success: false, error: getErrorMessage(e) } as R;
}
```
So useAddAgent and AddRuntimeEndpointFlow.handleCreateComplete retain the same 'throw → failure result' contract as before. No behavior change.
remove.runtime-endpoint gap — RuntimeEndpointPrimitive.ts is updated in this diff to wrap this.remove() in withCommandRunTelemetry('remove.runtime-endpoint', ...), and the command is registered in COMMAND_SCHEMAS. Coverage for all 12 removable resource types plus remove.all is now complete.

Spot-checked the rest:

SubCommand<G, S> is sound — uses Extract<Command, ${G}.${S}>, so unregistered sub-commands collapse to never at the call site.
All 12 ResourceType variants have matching remove.* entries in COMMAND_SCHEMAS, so \remove.${this.kind}`inBasePrimitiveanduseRemove` is always a valid key.
RemovalResult ({success:true} | {success:false, error:string}) and AddResult<T> both satisfy the R extends OperationResult constraint.
The runCliCommand refactor preserves the 'telemetry init failure runs fn() untracked and exits 0' behavior; process.exit returning never keeps the subsequent client.withCommandRun call reachable only when client is defined.
Removing the ! non-null assertions on configBundles in the integ tests is safe — the schema declares .default([]), so the field is always present after parse.
Integ test coverage for success + failure telemetry is in place for 10 of 12 remove commands; remove.agent and remove.runtime-endpoint coverage gap is called out in the PR description.

No new blocking issues. LGTM.

github-actions Bot added size/m PR size: M strands-running agentcore-harness-reviewing AgentCore Harness review in progress and removed agentcore-harness-reviewing AgentCore Harness review in progress labels Apr 30, 2026

Hweinstock force-pushed the feat/telemetry-remove-commands branch from 89d3db7 to 7cbfdc9 Compare April 30, 2026 21:36