Skip to content

Flaky test: TestQuerierWithStoreGatewayDataBytesLimits (integration_querier, arm64) — got 500 instead of expected 422 #7606

@sandy2008

Description

@sandy2008

AI Tool Usage Notice
If you used an AI tool to help draft this issue,
please make sure you have reviewed and validated all content before submitting.
You are responsible for the accuracy and quality of everything in this report.
Low-quality or unreviewed AI-generated submissions may be closed without further investigation.
See our Generative AI Contribution Policy for details.

Describe the bug

The integration_querier test TestQuerierWithStoreGatewayDataBytesLimits intermittently fails on arm64. It sets -store-gateway.max-downloaded-bytes-per-request: 1 and expects every query to be rejected with HTTP 422 ("exceeded bytes limit"), but occasionally receives 500:

// integration/querier_test.go
resp, body, err := c.QueryRaw(`{job="test"}`, series2Timestamp, map[string]string{}) // line 562
require.NoError(t, err)
require.Equal(t, http.StatusUnprocessableEntity, resp.StatusCode) // line 564  <-- got 500
require.Contains(t, string(body), "exceeded bytes limit")

Observed error:

querier_test.go:564:
    Error: Not equal: expected: 422  actual: 500
    Test:  TestQuerierWithStoreGatewayDataBytesLimits

Store-gateway logs around the failure show "series size exceeded expected size; refetching" ... maxSeriesSize=1. It appears that when the 1-byte limit is enforced mid-fetch, the store-gateway sometimes surfaces a generic 500 instead of the expected 422 "exceeded bytes limit" — i.e., the resource-exhausted path races with the "refetch" path. The assertion line has been stable since 2023 (#5286), so this is timing, not a regression.

To Reproduce

Steps to reproduce the behavior:

  1. Start Cortex (recent master)
  2. Run the integration test on arm64 (flaky; multiple runs may be needed):
    go test -tags=integration,integration_querier -count=5 -run TestQuerierWithStoreGatewayDataBytesLimits ./integration/...
    

Expected behavior

When the downloaded-bytes limit is exceeded, the querier deterministically returns 422 with "exceeded bytes limit", never a generic 500.

Environment:

  • Infrastructure: GitHub Actions CI, ubuntu-24.04-arm (arm64), integration job, tag integration_querier
  • Deployment tool: N/A (Docker-based integration test)

Additional Context

Observed on CI (arm64, on a PR unrelated to the querier), 2026-06-02:
https://github.com/cortexproject/cortex/actions/runs/26832378915/job/79118070108

Filed from CI failure-log analysis with AI assistance; the run link and querier_test.go:564 were reviewed and verified against master before submitting.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions