Skip to content

cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying#46167

Draft
Copilot wants to merge 10 commits intomainfrom
copilot/fix-azure-cosmos-403-error
Draft

cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying#46167
Copilot wants to merge 10 commits intomainfrom
copilot/fix-azure-cosmos-403-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

  • Analyze issue: AsyncCosmosBearerTokenCredentialPolicy does not handle HTTP 403 with sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) - only 401 is handled
  • Add send() override to AsyncCosmosBearerTokenCredentialPolicy in _auth_policy_async.py to clear cached token and retry on 403/5300
  • Add send() override to sync CosmosBearerTokenCredentialPolicy in _auth_policy.py for the same fix
  • Rewrite tests using realistic Pipeline/AsyncPipeline with MockTransport that returns proper requests.Response objects with headers
  • Tests verify Authorization header format (type=aad&ver=1.0&sig=<token>) in both initial and retry requests
  • Fix spelling: "retriable" → "retryable" in test docstrings
  • Update CHANGELOG.md with bug fix entry referencing PR 46167
  • 14 tests passing (7 sync + 7 async)

bambriz and others added 3 commits April 6, 2026 14:16
…and async)

When Cosmos DB returns HTTP 403 with sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED),
the cached bearer token is now cleared and the request is retried with a fresh token.
This mirrors how the base class handles HTTP 401, and resolves the issue where
long-running services using managed identity would permanently fail after token expiry.

- Added send() override to CosmosBearerTokenCredentialPolicy (_auth_policy.py)
- Added send() override to AsyncCosmosBearerTokenCredentialPolicy (_auth_policy_async.py)
- Added unit tests for both sync and async policies

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a5381531-6292-4e5e-be43-586d3267d980

Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix azure.cosmos client recovery from 403 on expired AAD token cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying Apr 6, 2026
Copilot AI requested a review from bambriz April 6, 2026 21:24
from azure.cosmos.http_constants import HttpHeaders, SubStatusCodes


def _make_request():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot request headers might be needed. Is there not a better way to test this? Something more expansive and using the sdk code more realistically to mimic potential issues that may occur.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote both test files (commit cb77da60) to use a realistic Pipeline/AsyncPipeline backed by a MockTransport that extends HttpTransport/AsyncHttpTransport. The transport returns proper requests.Response objects — including the x-ms-substatus header on the 403 — and records every outgoing request so tests can assert on the exact Authorization header value (verifying the Cosmos AAD format type=aad&ver=1.0&sig=<token> and that a fresh token is used on the retry). This mirrors the pattern used in azure-core's own authentication test suite.

bambriz and others added 2 commits April 6, 2026 16:02
Replace MagicMock-based policy isolation tests with full azure-core
Pipeline/AsyncPipeline tests that use a real MockTransport (extending
HttpTransport/AsyncHttpTransport). This approach:

- Returns proper requests.Response objects with real header dicts,
  including x-ms-substatus for the 403/5300 trigger
- Records all outgoing requests so tests can assert on the exact
  Authorization header value (type=aad&ver=1.0&sig=<token>)
- Verifies both the Cosmos AAD header format and token freshness
  end-to-end through the full policy pipeline, matching the pattern
  used in azure-core's own authentication tests

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a67aa68c-a3b6-42ef-824f-aa45415226d6

Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 6, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from bambriz April 6, 2026 23:10
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 7, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from bambriz April 7, 2026 04:43
@bambriz
Copy link
Copy Markdown
Member

bambriz commented Apr 7, 2026

@copilot update changelog to mention the pr and the fix applied.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 7, 2026

@copilot update changelog to mention the pr and the fix applied.

Done in commit f2daec1e — added a bug fix entry to the 4.16.0b3 (Unreleased) section of CHANGELOG.md referencing PR 46167.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • scanning-api.github.com
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

azure-cosmos: async client does not recover from 403 on expired AAD token (bearer token policy only handles 401)

2 participants