[GATEWAY V2]: Add aggressive HTTP timeout policies.#47879
Open
jeet1995 wants to merge 103 commits intoAzure:mainfrom
Open
[GATEWAY V2]: Add aggressive HTTP timeout policies.#47879jeet1995 wants to merge 103 commits intoAzure:mainfrom
jeet1995 wants to merge 103 commits intoAzure:mainfrom
Conversation
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
* fix few tests part 2 --------- Co-authored-by: Annie Liang <anniemac@Annies-MacBook-Pro.local>
…ning effort configuration (Azure#47772) Co-authored-by: Xiting Zhang <xitzhang@microsoft.com>
* [VoiceLive]Release 1.0.0-beta.4 Updated release date for version 1.0.0-beta.4 and added feature details. * Revise CHANGELOG for clarity and bug fixes Updated changelog to remove breaking changes section and added details about bug fixes.
…Java-5433741 (Azure#46952) * Configurations: 'specification/nginx/Nginx.Management/tspconfig.yaml', API Version: 2025-03-01-preview, SDK Release Type: beta, and CommitSHA: 'aae85aa3e7e4fda95ea2d3abac0ba1d8159db214' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5433741 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * Configurations: 'specification/nginx/Nginx.Management/tspconfig.yaml', API Version: 2025-03-01-preview, SDK Release Type: beta, and CommitSHA: 'de8103ff8e94ea51c56bb22094ded5d2dfc45a6a' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=5857234 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. --------- Co-authored-by: Weidong Xu <weidxu@microsoft.com>
false can't be assigned to int in java. Updating type to boolean
* Deprecating azure-resourcemanager-mixedreality * Typos * use 1.0.1 as version * Update CHANGELOG.md --------- Co-authored-by: Michael Zappe <michaelzappe@microsoft.com> Co-authored-by: Weidong Xu <weidxu@microsoft.com>
* fix few tests part 3 --------- Co-authored-by: Annie Liang <anniemac@Annies-MacBook-Pro.local> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
@FabianMeiswinkel and @xinlian12 - as discussed offline a Two tracking issues: |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…github.com/jeet1995/azure-sdk-for-java into AzCosmos_HttpTimeoutPolicyChangesGatewayV2
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Aggressive HTTP Timeout Policies for Gateway V2 (Thin Client)
Description
Adds shorter HTTP timeout policies for document operations routed through Gateway V2 (Thin Client). The default Gateway timeout is 60 seconds, but Thin Client operations are expected to complete in single-digit seconds. This PR introduces
HttpTimeoutPolicyForGatewayV2with 6s/6s/10s timeouts to enable faster failover and retry.Changes
New
HttpTimeoutPolicyForGatewayV2with two singleton instances:INSTANCE_FOR_POINT_READ— for point read operationsINSTANCE_FOR_QUERY_AND_CHANGE_FEED— for query and change feed operationsUpdated
HttpTimeoutPolicy.getTimeoutPolicy()routing logic:useThinClientMode == trueandresourceType == Document:OperationType.Read→ point read policyOperationType.Query/ change feed → query policyExtended
ResponseTimeoutAndDelayswith aDuration-based constructor overload to support zero-duration delaysHTTP response timeout now surfaced in Gateway diagnostics —
ClientSideRequestStatistics.GatewayStatisticscaptures thehttpResponseTimeoutfrom each request and serializes it ashttpNwResponseTimeoutin the diagnostics JSON. This makes it possible to see exactly which timeout value was applied to each gateway call, aiding debuggability when different policies (default vs. Gateway V2) are in play.Timeout Comparison
H2 Connection Lifecycle Instrumentation
Every gateway request now surfaces HTTP/2 connection identity in
CosmosDiagnostics:channelId— H2 stream channel (e.g.,f37707c7/2= stream 2 on parentf37707c7). New stream per request (RFC 9113 §5.1.1).parentChannelId— TCP connection (NioSocketChannel) multiplexing all streams. This is the connection reuse identity.isHttp2— protocol flag. Only present for HTTP/2.These appear on every
gatewayStatisticsListentry — success (200), timeout (408/10002), and e2e cancel (408/20008) — enabling support engineers to correlate stream failures to their parent TCP connection.Additional production changes
ReactorNettyClientConnectionObservercaptures channel IDs at CONNECTED / ACQUIRED / STREAM_CONFIGURED. ExtractedcaptureChannelIds()andgetRequestRecordFromConnection()helpers.ReactorNettyRequestRecordchannelId,parentChannelId,isHttp2StoreResponse/StoreResponseDiagnosticsClientSideRequestStatisticsGatewayStatisticsserializes channel IDs;isHttp2only when trueBridgeInternalrecordGatewayResponseoverload acceptingReactorNettyRequestRecordRxGatewayStoreModelrequestUri.ThinClientStoreModelrefCnt()guards at all 6 ByteBuf release sitesConnection Lifecycle Tests
Why
Before enabling aggressive timeouts, we needed to prove that stream-level
ReadTimeoutExceptiondoes not close the parent TCP connection. Confirmed from reactor-netty 1.2.13 source:ReadTimeoutHandleris on the H2 stream pipeline (HttpClientOperations.onOutboundComplete()), not the parent.SDK Fault Injection (in CI)
3 tests in
FaultInjectionServerErrorRuleOnGatewayV2Tests— connection reuse after timeout, connection survives e2e timeout, connection survives for next request.Docker tc netem (manual —
ManualNetworkDelayConnectionLifecycleTests)connectionReuseAfterRealNettyTimeoutparentChannelIdconnectionSurvivesE2ETimeoutWithRealDelaymultiParentChannelConnectionReusesurvivalRate=10/10parentChannelSurvivesE2ECancelWithoutReadTimeoutretryUsesConsistentParentChannelIdFollow-ups (separate PRs)
AzCosmos_H2ConnectAcquireTimeout.AzCosmos_H2ChannelHealthChecker.Testing
WebExceptionRetryPolicyTest: verify timeout progression (6s/6s/10s) and zero-backoff for both point read and query in Thin Client mode; confirm writes are not retriedFaultInjectionServerErrorRuleOnGatewayV2Tests: injects a single 61s server response delay — the first attempt times out at 6s, the retry hits the server after the injected delay has cleared and succeeds, asserting end-to-end latency stays under 8 secondsAll SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines