Skip to content

T keithyao/batch purge Adds ABAC support for purge command with batched token requests#577

Open
keithy1012 wants to merge 26 commits intoAzure:mainfrom
keithy1012:t-keithyao/batch-purge
Open

T keithyao/batch purge Adds ABAC support for purge command with batched token requests#577
keithy1012 wants to merge 26 commits intoAzure:mainfrom
keithy1012:t-keithyao/batch-purge

Conversation

@keithy1012
Copy link

Purpose of the PR

  • Implements batch processing logic for the purge command.
  • ABAC-enabled registries grant permissions at the repository level, not registry-wide. The current auth flow requests a wildcard scope, which fails for ABAC registries because they cannot grant blanket access to all repositories.
  • Solution: Detects ABAC registries by checking for aad_identity in the refresh token. Batch token requests for ABAC registries: instead of one wildcard token, request tokens scoped to batches of 10 repositories at a time. Non-ABAC registries continue using the existing wildcard flow.

Fixes #


// Request a token that covers all repositories in this batch
// The token scope will include: repository:repo1:pull repository:repo1:delete ... for each repo
if err := acrClient.RefreshTokenForAbac(ctx, batch); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token is only refreshed once at the start of the requests. This seems like it could be problematic as I presume these tokens time out. You should have some strategy to rotate this as time goes on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the token is refreshed per batch. In practice we want to generally avoid doing the refreshes upfront and just refresh as needed on the token acquisition portion. So for example if you look at the getManifest calls you will see we call refreshAcrCLIClientToken first on those. What you could do is modify that function to refresh the token you have if its necessary dynamically at that stage.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

Copy link
Member

@lizMSFT lizMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@estebanreyl estebanreyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall. I would recommend adding some test to the cli experimental tests as well. Just some minor things left to address


// Request a token that covers all repositories in this batch
// The token scope will include: repository:repo1:pull repository:repo1:delete ... for each repo
if err := acrClient.RefreshTokenForAbac(ctx, batch); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the token is refreshed per batch. In practice we want to generally avoid doing the refreshes upfront and just refresh as needed on the token acquisition portion. So for example if you look at the getManifest calls you will see we call refreshAcrCLIClientToken first on those. What you could do is modify that function to refresh the token you have if its necessary dynamically at that stage.

cmd/acr/purge.go Outdated
// Process all repositories in this batch
for j, repoName := range batch {
// For ABAC registries, check if token expired and refresh for remaining repos in batch
if acrClient.IsAbac() && acrClient.IsTokenExpired() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token can fail at any moment, for example let's say there are a lot of images inside of a particular repository that need to be scanned. We could have the token expire mid iteration and thus that repo listing and exploration would fail. My suggestion would be to give the acquirer the ability to refresh itself, so for example I see that you are not storing the token anwyhere in this part of the code, instead its maintained inside of acrClient,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about updating acrsdk.go to handle single repo token refresh:

// In acrsdk.go - modify refreshAcrCLIClientToken
func refreshAcrCLIClientToken(ctx context.Context, c *AcrCLIClient, repoName string) error {
    var scope string
    if c.isAbac && repoName != "" {
        scope = fmt.Sprintf("repository:%s:pull,delete,metadata_read,metadata_write", repoName)
    } else {
        scope = "repository:*:*"
    }
    // ... refresh logic
}

This allows SDK methods like GetAcrTags() to auto-refresh with single repo scope when the token expires mid-operation. But we should still keep the batch-level token refresh logic in purge.go for efficiency. The batch refresh at the start of each batch reduces the number of token requests when processing multiple repositories. And to coordinate between these two, we could add a flag (e.g., NeedsBatchRefresh()) that gets set to true when a single repo refresh occurs. Then in purge.go, after processing each repository, we can check this flag and proactively refresh for the remaining repositories in the batch if needed. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this approach require 2 token refreshes (one for the current repoName, one for the rest of the batch)? Also, I think this would increase complexity for state management since we require another flag like NeedsBatchRefresh. I think the current implementation is less complex and accomplishes the same task. Does that make sense?


// GetAcrTags list the tags of a repository with their attributes.
func (c *AcrCLIClient) GetAcrTags(ctx context.Context, repoName string, orderBy string, last string) (*acrapi.RepositoryTagsType, error) {
if c.isExpired() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block also needs to be updated. Otherwise, when GetAcrTags detects that the token has expired, it will obtain an RBAC access token instead. This also applies to DeleteAcrTag

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional potential ABAC expiry edge case:

refreshAcrCLIClientToken() seems to request wildcard scope (repository:*:*), and this expiry path seems to be called by SDK methods like GetAcrTags/DeleteAcrTag/GetAcrManifests.

Even with ABAC batch refresh in purge(), if expiry is hit during these SDK operations, it seems like this path could still fall back to wildcard refresh. On ABAC registries, wildcard refresh fallback attempts will likely fail.

Perhaps, you could make the SDK expiry refresh ABAC-aware (repo-scoped) as well? A targeted test in internal/api/acrsdk_test.go for this expiry path will likely help prevent regressions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented!


// GetAcrTags list the tags of a repository with their attributes.
func (c *AcrCLIClient) GetAcrTags(ctx context.Context, repoName string, orderBy string, last string) (*acrapi.RepositoryTagsType, error) {
if c.isExpired() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional potential ABAC expiry edge case:

refreshAcrCLIClientToken() seems to request wildcard scope (repository:*:*), and this expiry path seems to be called by SDK methods like GetAcrTags/DeleteAcrTag/GetAcrManifests.

Even with ABAC batch refresh in purge(), if expiry is hit during these SDK operations, it seems like this path could still fall back to wildcard refresh. On ABAC registries, wildcard refresh fallback attempts will likely fail.

Perhaps, you could make the SDK expiry refresh ABAC-aware (repo-scoped) as well? A targeted test in internal/api/acrsdk_test.go for this expiry path will likely help prevent regressions.

Copy link
Member

@lizMSFT lizMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would also be helpful if you can follow https://github.com/Azure/acr-cli/blob/main/scripts/experimental/README-purge-testing.md#performance-testing to do some performance testing and include the result in the PR description

.gitignore Outdated
coverage.txt
*.test
*.out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you keep this change in your local env?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

@balcsida balcsida mentioned this pull request Mar 3, 2026
6 tasks
@keithy1012 keithy1012 marked this pull request as ready for review March 3, 2026 21:05
Copy link
Contributor

@estebanreyl estebanreyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I validated its working. Left a couple of minor comments but once those are addressed I think we can move forward.

README.md Outdated
--include-locked
```

#### ABAC batch size (environment variable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some information on using ABAC registries in general. Maybe note what permissions are necessary + what happens if partial access is available

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented

assert_equals "$initial_manifests" "$final_manifests" "Manifest should remain"
}

run_test_comp_age() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to validate with this change? I dont think this script change is necessarily super relevant to the abac scenarios.


// Collect all repository names into a slice for batching
repos := make([]string, 0, len(tagFilters))
for repoName := range tagFilters {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Go map iteration is non-deterministic, causing unstable batching and test results. Consider sorting repos before batching

if repo == "catalog" {
scopeParts = append(scopeParts, "registry:catalog:*")
} else {
scopeParts = append(scopeParts, fmt.Sprintf("repository:%s:pull,delete,metadata_read,metadata_write", repo))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Both RefreshTokenForAbac and refreshAcrCLIClientToken functions build ABAC scopes with near-identical logic (the catalog sentinel handling and the repository:%s:pull,delete,metadata_read,metadata_write format string). This is duplicated and could drift.
Consider extracting a shared helper

@keithy1012 keithy1012 force-pushed the t-keithyao/batch-purge branch from 9521565 to 2df3959 Compare March 5, 2026 00:21
Copy link
Contributor

@estebanreyl estebanreyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will wait for the license agreement check and merge once thats done :)

@keithy1012
Copy link
Author

@keithy1012 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

acr purge command: support ABAC-enabled registries if identity has full data plane permissions

6 participants