-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[GATEWAY V2]: Add aggressive HTTP timeout policies. #47879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jeet1995
merged 106 commits into
Azure:main
from
jeet1995:AzCosmos_HttpTimeoutPolicyChangesGatewayV2
Mar 4, 2026
Merged
Changes from all commits
Commits
Show all changes
106 commits
Select commit
Hold shift + click to select a range
93f03a3
Introducing Gateway V2.0 dedicated HTTP timeout policy.
jeet1995 35d94f3
Introducing Gateway V2.0 dedicated HTTP timeout policy.
jeet1995 3c126cb
Introducing Gateway V2.0 dedicated HTTP timeout policy.
jeet1995 7adaacb
Introducing Gateway V2.0 dedicated HTTP timeout policy.
jeet1995 512d06a
Merge branch 'Azure:main' into AzCosmos_HttpTimeoutPolicyChangesGatew…
jeet1995 94d4cb2
Clean up.
jeet1995 c38f7ae
Clean up.
jeet1995 ab70db3
Clean up.
jeet1995 610ad49
FixFewTests-part2 (#47933)
xinlian12 c93869b
Remove test-jar dependency with copied code (#47917)
alzimmermsft deaa2b3
[VoiceLive]Add Foundry Agent integration, filler responses, and reaso…
xitzhang 57bd4f8
Remove mssql-jdbc dependency and update assertj-core (#47945)
alzimmermsft 2781d4f
[VoiceLive]Release 1.0.0-beta.4 (#47946)
xitzhang d2de5eb
[AutoPR azure-resourcemanager-nginx]-generated-from-SDK Generation - …
azure-sdk fc2da2f
Update wrong data type (#47937)
sandeepdhamija 9d1c4cf
Deprecating azure-resourcemanager-mixedreality (#47943)
MichaelZp0 4ae43d4
[Automation] Generate SDK based on TypeSpec 0.39.1 (#47953)
azure-sdk 486f7d2
Increment package versions for mixedreality releases (#47955)
azure-sdk 96edfbb
fixFewTests - Part3 (#47939)
xinlian12 1999049
Re-enable spring-cloud-azure-starter-monitor for Spring Boot 4 (#47951)
Copilot 3294b99
Clean up.
jeet1995 7238e33
Clean up.
jeet1995 d81307b
Migrate azure-search-documents to TypeSpec (#47819)
alzimmermsft 04d1389
Increment package versions for nginx releases (#47965)
azure-sdk 5fb1ad6
Increment package versions for ai releases (#47947)
azure-sdk e12c45b
escapeNonAscIIPkValueForQueryPlanAndQuery (#47881)
xinlian12 2ef4708
Deprecate AgriFood FarmBeats SDK and code cleanup (#47935)
samvaity d472305
Search TypeSpec migration - remove last few BinaryData APIs from publ…
alzimmermsft 2b6c73d
Fix GraalVM native image compatibility for AzureIdentityEnvVars (#47940)
g2vinay 4e7408b
Release azure-cosmos 4.78.0, azure-cosmos-encryption 2.27.0, and Spar…
xinlian12 2e9ea28
Fix: Include stack trace in token error logs (#47974)
g2vinay 49294c0
[AutoPR azure-resourcemanager-resources-deploymentstacks]-generated-f…
azure-sdk 905a8d6
Set default values for head_sha and repo_url in generate_typespec_pro…
Copilot 7d1046e
Increment package versions for cosmos releases (#47983)
azure-sdk 45ef854
Nregion synchronous commit feature (#47757)
mbhaskar a1dffe9
Add ConnectionDetails support for EventHubs (#47926)
Copilot 88e14b0
Increment package versions for edgeactions releases (#47991)
azure-sdk 753f744
mgmt network, update api-version to 2025-05-01 (#47831)
v-huizhu2 f3d135f
[AutoPR azure-resourcemanager-computebulkactions]-generated-from-SDK …
azure-sdk 047ff77
Increment package versions for disconnectedoperations releases (#47992)
azure-sdk 4dd4c40
Increment package versions for network releases (#47994)
azure-sdk a4bc25d
Deprecating azure-mixedreality-authentication (#47942)
MichaelZp0 276cee5
Remove all MixedReality SDKs (#47885)
MichaelZp0 a4cfbdd
Remove Operational Insights from CODEOWNERS (#47989)
ronniegeraghty 54cd75d
Update service owners to AzureSdkOwners in CODEOWNERS (#47988)
ronniegeraghty 70396e5
Fix java - spring - tests by adding Thread.sleep (#47990)
rujche 0aeccb8
Change PRLabel from %Azure Quantum to %Quantum (#47948)
ronniegeraghty 7f09460
Remove commented Device Provisioning Service owners (#47949)
ronniegeraghty 55920c3
Add checkstyle rule to validate serialization method completeness (#4…
Copilot b4a83ce
Fix pipeline failure about linting-extensions (#48005)
rujche 52002b1
Only publish docs.ms and github.io docs if publishing to Maven (#47997)
danieljurek dd32b1b
avoidExtraQuery (#47996)
xinlian12 17fbc78
Sync eng/common directory with azure-sdk-tools for PR 13968 (#48004)
azure-sdk d8550cc
Configurations: 'specification/codesigning/CodeSigning.Management/ts…
azure-sdk 682229f
[Automation] Generate SDK based on TypeSpec 0.39.2 (#48006)
azure-sdk 4e3c313
[VoiceLive] Update for agent V2, remove foundry tools, rename filler …
xitzhang 81b998a
[VoiceLive] Release 1.0.0-beta.5 (#48013)
xitzhang 416e7b5
Ignore implementation packages when generating docs (#47998)
srnagar 7c4e3cb
Increment package versions for artifactsigning releases (#48017)
azure-sdk 7ecbfd6
[AutoPR azure-resourcemanager-managedops]-generated-from-SDK Generati…
azure-sdk edcf070
[Kafka connector]AddSupportForThroughputBucket (#48009)
xinlian12 289c832
mgmt, trustedsigning, update to next preview (#48016)
weidongxu-microsoft f2e6b14
- Adding nregion feature to changelog (#47987)
mbhaskar 9b1a477
Increment package versions for resources releases (#48020)
azure-sdk 7c3932c
Increment package versions for managedops releases (#48027)
azure-sdk 67073ae
Remove unused UnitSpec from fabric-cosmos-spark-auth_3 (#48010)
Copilot 229de3f
update release date (#48029)
ryazhang-microsoft 5b3d14c
Replace `azd config list` with `azd auth status` in TROUBLESHOOTING.m…
scottaddie 31e1a1b
Bug 47910.count query text block (#47911)
Blackbaud-JasonBodnar 94bc255
Add tests for LAZY indexing mode in Cosmos Java SDK (#48024)
Copilot c98c130
Storage - STG101 Beta Features (#48019)
ibrandes 9f473ec
Increment package versions for ai releases (#48035)
azure-sdk 85a90f3
Adding tests to associate channel lifecycle with Netty's ReadTimeoutE…
jeet1995 f21a5ab
Update changelog and README files for multiple Azure Storage SDK comp…
ibrandes bd4365b
Replace ThreadLocal Collator with instance Collator (#48037)
alzimmermsft 85bd7a0
Update azd section of Identity troubleshooting guide (#48038)
scottaddie dce90cc
Open Storage - STG101 Beta Release Date Bump (#48045)
ibrandes beb5bb9
[SparkConnector]updateTransactionalBulkConfig (#48008)
xinlian12 38064d5
[SparkConnector]IncludeOperationStatusCodeHistoryInStaleProgressLogs …
xinlian12 9c8603b
Update to use JDK's deafult trust CA store for cert validations (#48046)
samvaity 2f66b3a
Fix Netty ByteBuf leak in RxGatewayStoreModel via doFinally safety ne…
kushagraThapar 00621cb
Adding tests to associate channel lifecycle with Netty's ReadTimeoutE…
jeet1995 e4bcbe1
Add Azure Artifacts Feed Setup section to CONTRIBUTING.md (#48032)
raych1 c8c2ec2
Use CFS as the package resolution source (#47901)
raych1 b44d84a
Fix: PublishDevFeedPackage runs in parallel with VerifyReleaseVersion…
Copilot dcf8ecd
Add e2ePolicyCfg to GatewayStatistics for timeout policy diagnostics
jeet1995 fe100e1
Remove implementation/TestSuiteBase.java and consolidate to rx/TestSu…
Copilot 4548099
Part 1: Add multi-parent-channel and retry-parentChannelId tests
jeet1995 152d405
Part 1: Fix retryUsesConsistentParentChannelId + add evidence MD
jeet1995 706edd6
OpenSpec: Rectify Part 2 spec — 1s GW V2 connect/acquire timeout bifu…
jeet1995 1e0cc12
Part 1: ALL 7/7 PASSED — relaxed tc netem assertions for kernel TCP RST
jeet1995 80460bf
Generating `azure-ai-projects` from latest spec (#47875)
jpalvarezl 2dcce8b
Clean up
jeet1995 400894e
Merge branch 'main' of https://github.com/jeet1995/azure-sdk-for-java…
jeet1995 4c2aca9
Merge branch 'Azure:main' into AzCosmos_HttpTimeoutPolicyChangesGatew…
jeet1995 ceb99ba
Clean up.
jeet1995 c582a85
Adding tests with manual packet delay tests.
jeet1995 2d5228f
Addressing comments.
jeet1995 49bfddb
Clean up.
jeet1995 467b8b8
Merge branch 'Azure:main' into AzCosmos_HttpTimeoutPolicyChangesGatew…
jeet1995 cc8c0bd
Clean up.
jeet1995 6460378
Merge branch 'AzCosmos_HttpTimeoutPolicyChangesGatewayV2' of https://…
jeet1995 af4b085
Merge branch 'Azure:main' into AzCosmos_HttpTimeoutPolicyChangesGatew…
jeet1995 794fbbf
Clean up.
jeet1995 618d088
Merge branch 'AzCosmos_HttpTimeoutPolicyChangesGatewayV2' of https://…
jeet1995 32dd112
Merge branch 'main' of https://github.com/jeet1995/azure-sdk-for-java…
jeet1995 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
119 changes: 119 additions & 0 deletions
119
sdk/cosmos/azure-cosmos-tests/NETWORK_DELAY_TESTING_README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| # Http2ConnectionLifecycleTests — Network Delay Testing | ||
|
|
||
| ## What This Tests | ||
|
|
||
| `Http2ConnectionLifecycleTests` validates that HTTP/2 parent TCP connections (NioSocketChannel) survive | ||
| stream-level ReadTimeoutExceptions triggered by real network delay. Uses Linux `tc netem` to inject | ||
| kernel-level packet delay inside a Docker container. | ||
|
|
||
| **Key invariant proven:** A real netty `ReadTimeoutException` on an `Http2StreamChannel` does NOT close | ||
| the parent `NioSocketChannel` — the connection pool reuses it for subsequent requests. | ||
|
|
||
| ## Why Not SDK Fault Injection? | ||
|
|
||
| SDK `RESPONSE_DELAY` adds a `Mono.delay()` at the HTTP layer — bytes still flow normally on the wire. | ||
| Netty's `ReadTimeoutHandler` never fires because it monitors actual socket I/O, not application-layer delays. | ||
| Only `tc netem` creates real kernel-level packet delay that triggers the handler. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Docker Desktop with Linux containers | ||
| - Docker memory: **8 GB+** | ||
| - A Cosmos DB account with thin client enabled | ||
| - Credentials in `sdk/cosmos/cosmos-v4.properties`: | ||
| ```properties | ||
| ACCOUNT_HOST=https://<account>.documents.azure.com:443/ | ||
| ACCOUNT_KEY=<primary-key> | ||
| ``` | ||
|
|
||
| ## Build | ||
|
|
||
| ```bash | ||
| cd sdk/cosmos | ||
|
|
||
| # Build SDK | ||
| mvn clean install -pl azure-cosmos,azure-cosmos-test,azure-cosmos-tests -am \ | ||
| -DskipTests -Dgpg.skip -Dcheckstyle.skip -Dspotbugs.skip \ | ||
| -Drevapi.skip -Dmaven.javadoc.skip -Denforcer.skip -Djacoco.skip | ||
|
|
||
| # Build Docker image | ||
| docker build -t cosmos-netem-test -f azure-cosmos-tests/Dockerfile.netem . | ||
|
|
||
| # Generate Linux classpath | ||
| mvn dependency:build-classpath -f azure-cosmos-tests/pom.xml -DincludeScope=test | ||
| # Convert Windows paths → Linux paths, save to azure-cosmos-tests/target/cp-linux.txt | ||
| ``` | ||
|
|
||
| ## Run | ||
|
|
||
| ```bash | ||
| cd sdk/cosmos | ||
|
|
||
| ACCOUNT_HOST=$(grep "^ACCOUNT_HOST" cosmos-v4.properties | cut -d: -f2- | tr -d ' ') | ||
| ACCOUNT_KEY=$(grep "^ACCOUNT_KEY" cosmos-v4.properties | cut -d: -f2- | tr -d ' ') | ||
|
|
||
| docker run --rm --cap-add=NET_ADMIN --memory 8g \ | ||
| -v "$(pwd):/workspace" \ | ||
| -v "$HOME/.m2:/root/.m2" \ | ||
| -e "ACCOUNT_HOST=$ACCOUNT_HOST" \ | ||
| -e "ACCOUNT_KEY=$ACCOUNT_KEY" \ | ||
| cosmos-netem-test bash -c ' | ||
| cd /workspace && | ||
| CP=$(cat azure-cosmos-tests/target/cp-linux.txt) && | ||
| java --add-opens java.base/java.lang=ALL-UNNAMED \ | ||
| --add-opens java.base/java.util=ALL-UNNAMED \ | ||
| --add-opens java.base/java.net=ALL-UNNAMED \ | ||
| --add-opens java.base/java.io=ALL-UNNAMED \ | ||
| --add-opens java.base/java.nio=ALL-UNNAMED \ | ||
| --add-opens java.base/java.util.concurrent=ALL-UNNAMED \ | ||
| --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED \ | ||
| --add-opens java.base/sun.nio.ch=ALL-UNNAMED \ | ||
| --add-opens java.base/sun.nio.cs=ALL-UNNAMED \ | ||
| --add-opens java.base/sun.security.action=ALL-UNNAMED \ | ||
| --add-opens java.base/sun.util.calendar=ALL-UNNAMED \ | ||
| -cp "$CP" \ | ||
| -DACCOUNT_HOST=$ACCOUNT_HOST \ | ||
| -DACCOUNT_KEY=$ACCOUNT_KEY \ | ||
| -DCOSMOS.THINCLIENT_ENABLED=true \ | ||
| -DCOSMOS.HTTP2_ENABLED=true \ | ||
| org.testng.TestNG /workspace/azure-cosmos-tests/src/test/resources/manual-thinclient-network-delay-testng.xml \ | ||
| -verbose 2 | ||
| ' | ||
| ``` | ||
|
|
||
| ## tc netem Commands Used | ||
|
|
||
| ### Add Global Delay | ||
|
|
||
| ```bash | ||
| tc qdisc add dev eth0 root netem delay 8000ms | ||
| ``` | ||
|
|
||
| Delays ALL outbound packets by 8 seconds. This includes TCP SYN, data, ACKs. | ||
| The delay causes Netty's `ReadTimeoutHandler` to fire because the server's response | ||
| ACKs are delayed, stalling TCP flow from the application's perspective. | ||
|
|
||
| ### Remove Delay | ||
|
|
||
| ```bash | ||
| tc qdisc del dev eth0 root netem | ||
| ``` | ||
|
|
||
| Restores normal networking. Called in `@AfterMethod` and `@AfterClass` as safety net. | ||
|
|
||
| ## Tests | ||
|
|
||
| | Test | What It Proves | | ||
| |------|---------------| | ||
| | `connectionReuseAfterRealNettyTimeout` | Parent NioSocketChannel survives ReadTimeoutException; recovery read uses same `parentChannelId` | | ||
| | `multiParentChannelConnectionReuse` | Under concurrent load (>30 streams), multiple parent channels are created and ALL survive timeout | | ||
| | `retryUsesConsistentParentChannelId` | Retry attempts (6s→6s→10s) use consistent parent channel(s); pool recovers post-delay | | ||
| | `connectionSurvivesE2ETimeoutWithRealDelay` | Parent survives when e2e timeout (7s) AND ReadTimeoutHandler both fire | | ||
| | `parentChannelSurvivesE2ECancelWithoutReadTimeout` | Parent survives when e2e cancel (3s) fires BEFORE ReadTimeoutHandler (6s) — stream RST only | | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - Tests run **sequentially** (`parallel="false" thread-count="1"`) — tc netem is interface-global | ||
| - `--cap-add=NET_ADMIN` is required for `tc` commands (Linux `CAP_NET_ADMIN` capability) | ||
| - Each test creates/closes its own client (`@BeforeMethod`/`@AfterMethod`) for connection pool isolation | ||
| - Delay cleanup runs in `finally` blocks AND `@AfterMethod` for reliability |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.