Skip to content

RFC: Separate infrastructure deploy from application/runtime updates #377

@krokoko

Description

@krokoko

Primary area

Cross-cutting / multiple

Related issue or feature request

Complements #17 (onboarding UX) and future runtime repo-onboarding APIs.

Summary

Split ABCA deployment into two lifecycles: (A) infrastructure — deploy once via CDK; (B) application layer — update AgentCore runtime image, agent code, and repo configuration via CI without re-running full cdk deploy. Enable runtime repository onboarding so new GitHub repos can be registered without editing cdk/src/stacks/agent.ts and redeploying the stack.

Today a single mise //cdk:deploy (or CI deploy.ymlcdk deploy --all) bundles VPC, API Gateway, DynamoDB, Cognito, Lambda handlers, Docker image build, AgentCore Runtime provisioning, and Blueprint RepoConfig writes. Workshop participants and product teams pay ~10+ minutes per iteration even when they only changed agent Python code or want to add another repo.

Use case and motivation

Who: Operators running workshops, teams iterating on agent behavior daily, platform admins onboarding many repos.

Pain today (from REPO_ONBOARDING.md):

Onboarding is CDK-based. Each repo is an instance of the Blueprint construct in the CDK stack. Deploying the stack = onboarding or updating repos. There is no runtime API for repo CRUD.

Desired operator experience:

mise run install && mise run build    # once
mise //cdk:deploy                     # once — infra only (~10 min first time)
# CI on merge to main:
mise //agent:publish-runtime          # push new AgentCore image / update runtime — no CDK
bgagent repo onboard owner/repo …     # runtime — no CDK (future CLI/API)
bgagent submit --repo owner/repo …    # go

Workshop goal: In a one-day session, participants fork a repo, onboard it, and submit tasks without waiting for another full stack deploy.

Proposal

1. CDK stack split (or logical separation within one stack)

Layer Deploy cadence Examples
Infra (CDK, rare) Weeks/months VPC, API Gateway, Cognito, DynamoDB tables, orchestrator Lambdas, WAF, IAM bootstrap
Application (CI, frequent) Per PR / daily AgentCore Runtime image + version, agent workflows/, optional Lambda handler zip updates
Config (runtime, on demand) Per repo RepoConfig rows in RepoTable, per-repo secrets, Cedar policies

Implementation sketch:

  • Extract AgentCore Runtime image publish + UpdateAgentRuntime (or equivalent AgentCore control-plane API) into a mise //agent:deploy-runtime task invoked from .github/workflows/ independently of deploy.yml's cdk deploy.
  • CDK retains the Runtime resource and IAM wiring; CI updates the artifact (image digest / version pin) out-of-band — similar to ECS UpdateService without replacing the service CFN resource.
  • Gate full cdk deploy on infra/handler changes only; agent-only PRs skip CloudFormation (faster CI, lower blast radius). Align with ADR-013 integ smoke tiers.

2. Runtime repository onboarding API

Replace (or supplement) CDK-only Blueprint registration:

  • POST /v1/repos — onboard repo (owner/repo, model, token secret ref, optional overrides) → writes RepoConfig to DynamoDB (same schema as Blueprint construct).
  • GET /v1/repos, GET /v1/repos/{repo}, PATCH, DELETE (soft) — operator CRUD.
  • Auth: Cognito admin role and/or scoped API keys (see webhook API-key RFC).
  • Keep Blueprint construct for IaC-minded users who want repo config in Git; runtime API for dynamic onboarding.

Migration: Existing Blueprint writes remain source of truth until runtime API ships; document precedence (CDK vs API last-writer-wins or CDK-as-code only).

3. CI workflow changes

  • build.yml: continue synth + test; artifact agent image + handler zips.
  • deploy-infra.yml (rare): cdk deploy on cdk/** changes or manual dispatch.
  • deploy-runtime.yml (frequent): publish image, call AgentCore update API, smoke test — no cdk deploy.
  • Document in DEVELOPER_GUIDE.md and deployment guide.

Out of scope

  • Multi-region active-active (single region assumed).
  • Replacing CDK for initial infra bootstrap.
  • ECS compute variant split (can follow same pattern later).
  • Full blue/green orchestrator Lambda deploy (see ROADMAP "Safe orchestrator deploys") — separate concern.

Potential challenges

  • CloudFormation drift if CI updates Runtime outside CDK — need either (a) CDK CfnRuntime ignores image changes via aspect, (b) external resource pattern, or (c) CDK deploy only on image tag pointer updates with cdk deploy still fast when only tag changes.
  • IAM / ECR permissions for CI role distinct from CFN execution role (DEPLOYMENT_ROLES.md).
  • Repo onboarding without CDK must still provision per-repo egress allowlist entries, GitHub token secrets, and Cedar policy validation — may require async provisioning jobs.
  • Workshop safety: runtime POST /v1/repos needs guardrails (allowed org list, max repos, admin-only).

Dependencies and integrations

  • cdk/src/stacks/agent.ts, cdk/src/constructs/blueprint.ts, agent/ Dockerfile, .github/workflows/deploy.yml, mise.toml tasks.
  • AgentCore control plane APIs (bedrock-agentcore-control).
  • CLI commands (bgagent repo list|onboard|show) — see companion feature request.

Alternative solutions

Approach Pros Cons
Status quo — redeploy CDK for everything Simple mental model Slow workshops; couples agent and infra
BLUEPRINT_REPO env only No code change for fork Still requires CDK deploy per repo
Separate CDK stacks (infra + app) Clean CFN boundaries Two deploys; cross-stack exports
Runtime API only (this RFC) Fastest iteration More application code; drift management

Preference: Runtime API for repos + CI-only runtime deploy; keep CDK for infra. Optionally split into two CDK stacks if CFN blast radius warrants it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFC-proposalRequest for Comments: design proposalagent-runtimePython agent container: pipeline, runner, hooks, prompts, tools, Dockerfileci-cdBuild pipeline, deploy.yml, CI perf/caching, GitHub Actions workflowsinfra-cdkCDK stacks/constructs, bootstrap, deploy topology, tags, IAM wiring, teardown

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions