Skip to content

fix: pre-set ECR Lambda pull policy to prevent concurrent SetRepositoryPolicy race condition (#8190)#8945

Open
abhishektang wants to merge 4 commits intoaws:developfrom
abhishektang:feat/resolve-image-repos-package
Open

fix: pre-set ECR Lambda pull policy to prevent concurrent SetRepositoryPolicy race condition (#8190)#8945
abhishektang wants to merge 4 commits intoaws:developfrom
abhishektang:feat/resolve-image-repos-package

Conversation

@abhishektang
Copy link
Copy Markdown
Contributor

Problem

Fixes #8190.

When deploying a SAM application with multiple Lambda functions referencing the same or different ECR repositories, CloudFormation calls ecr:SetRepositoryPolicy concurrently — once per Lambda. Each call overwrites the existing policy rather than merging it, so whichever write lands last wins and earlier Lambdas lose access, resulting in a 403 on image pull.

Solution

Before creating the changeset, SAM CLI now pre-sets a stable SAMCliLambdaECRAccess policy SID on every ECR repository referenced by the deployment (via ImageRepository / ImageRepositories). Because the SID is deterministic, repeated calls are idempotent and safe.

Three private helpers are added to samcli/commands/deploy/deploy_context.py:

Helper Responsibility
_extract_ecr_repo_name Parse repo name from a full ECR URI
_ensure_ecr_lambda_pull_policy Collect all unique repo names and call _upsert_ecr_lambda_policy for each
_upsert_ecr_lambda_policy Idempotently set or merge the SAMCliLambdaECRAccess statement; retries on concurrent SetRepositoryPolicy conflicts; skips gracefully on AccessDeniedException

_ensure_ecr_lambda_pull_policy is called from DeployContext.run() immediately before create_and_wait_for_changeset.

Changes

  • samcli/commands/deploy/deploy_context.py — ECR policy helpers + call site
  • samcli/commands/deploy/exceptions.py — new ECRPolicySetError(UserException)
  • tests/unit/commands/deploy/test_ecr_policy_helpers.py — 21 new unit tests covering all branches of the three helpers
  • tests/unit/commands/deploy/test_deploy_context.py — patch _ensure_ecr_lambda_pull_policy at class level to isolate existing deploy tests from ECR side-effects
  • tests/unit/commands/_utils/test_template.py — fix test_updates_imageuri_when_pointing_to_local_archive: replace fragile CWD-relative file creation (which caused a PermissionError on macOS) with a pathlib.Path.is_file mock

Testing

pytest tests/unit/commands/deploy/test_deploy_context.py \
       tests/unit/commands/deploy/test_ecr_policy_helpers.py -v
# 35 passed

pytest --cov samcli --cov schema --cov-fail-under 94 tests/unit \
       --ignore=tests/unit/lib/cfn_language_extensions \
       --cov-config=.coveragerc_no_lang_ext
# 7479 passed, 0 failed, 94.05% coverage

Ruff and mypy also pass (mypy pre-existing errors are unrelated to this change).

@abhishektang abhishektang requested a review from a team as a code owner May 4, 2026 01:22
@github-actions github-actions Bot added area/deploy sam deploy command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels May 4, 2026
Implements issue aws#3888 to auto-create ECR repositories during
packaging, matching sam deploy behavior. Enables package-once,
deploy-many CI/CD workflows with managed ECR repos.

- Add --resolve-image-repos CLI option to sam package
- Call sync_ecr_stack() to auto-create managed ECR repositories
- Add validation requiring --s3-bucket when flag is used
- Add conflict detection with --image-repositories
- Add unit tests for validation logic

Closes aws#3888
…ryPolicy race condition (aws#8190)

- Add _ensure_ecr_lambda_pull_policy() called before changeset creation to
  pre-set a stable SAMCliLambdaECRAccess SID on all referenced ECR repos.
- Add _upsert_ecr_lambda_policy() to idempotently set/merge the policy,
  handling AccessDeniedException gracefully and retrying on concurrent
  SetRepositoryPolicy conflicts (ResourceInUseException).
- Add ECRPolicySetError exception for unrecoverable policy failures.
- Add 21 unit tests in test_ecr_policy_helpers.py covering all branches.
- Patch _ensure_ecr_lambda_pull_policy in TestSamDeployCommand to isolate
  deploy-flow tests from ECR side-effects.
- Fix test_updates_imageuri_when_pointing_to_local_archive: replace
  fragile CWD-relative file creation with pathlib.Path.is_file mock.
@abhishektang abhishektang force-pushed the feat/resolve-image-repos-package branch from dcbcc82 to 3ae260f Compare May 9, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/deploy sam deploy command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Docker image-based Lambda failures

1 participant