Skip to content

Flaky test: TestMultitenantAlertmanager_ServeHTTPWithFallbackConfig — data race between ApplyConfig goroutine spawn and Stop() #7603

@sandy2008

Description

@sandy2008

AI Tool Usage Notice
If you used an AI tool to help draft this issue,
please make sure you have reviewed and validated all content before submitting.
You are responsible for the accuracy and quality of everything in this report.
Low-quality or unreviewed AI-generated submissions may be closed without further investigation.
See our Generative AI Contribution Policy for details.

Describe the bug

TestMultitenantAlertmanager_ServeHTTPWithFallbackConfig (pkg/alertmanager) intermittently fails under the Go race detector (the test job runs with -race).

When a request hits MultitenantAlertmanager.ServeHTTP for a tenant without an explicit config, alertmanagerFromFallbackConfig lazily creates a per-tenant Alertmanager and calls ApplyConfig, which spawns two goroutines with no synchronization:

// pkg/alertmanager/alertmanager.go
go am.dispatcher.Run(time.Now()) // line 420
go am.inhibitor.Run()            // line 421

The test defers StopAndAwaitTerminated, whose stopping() path calls Alertmanager.Stop():

// pkg/alertmanager/alertmanager.go
func (am *Alertmanager) Stop() {
	if am.inhibitor != nil {
		am.inhibitor.Stop()  // line 430
	}
	if am.dispatcher != nil {
		am.dispatcher.Stop() // line 434
	}
	...
}

Stop() races against the just-spawned dispatcher.Run / inhibitor.Run goroutines, so the race detector reports a DATA RACE and the test fails.

To Reproduce

Steps to reproduce the behavior:

  1. Start Cortex (master @ 97b14b1, observed in the CI run linked below)
  2. Run the test under the race detector repeatedly (or re-run the CI test (amd64) job):
    go test -race -count=50 -run TestMultitenantAlertmanager_ServeHTTPWithFallbackConfig ./pkg/alertmanager/
    

Expected behavior

The test passes deterministically and the race detector reports no data race; Alertmanager.Stop() should be safe to call concurrently with the goroutines started by ApplyConfig.

Environment:

  • Infrastructure: GitHub Actions CI, ubuntu-24.04 (amd64), test job (race detector enabled)
  • Deployment tool: N/A (Go unit test)

Additional Context

Observed on master CI run (2026-06-08): https://github.com/cortexproject/cortex/actions/runs/27117093963 (job test (amd64)).

Race detector output (abridged), pointing at the ApplyConfig goroutine spawn (alertmanager.go:420-421) vs Stop() (alertmanager.go:434):

WARNING: DATA RACE
  ... (*Alertmanager).Stop()       pkg/alertmanager/alertmanager.go:434
  ... (*MultitenantAlertmanager).stopping()
vs (previous read/spawn)
  ... (*Alertmanager).ApplyConfig  pkg/alertmanager/alertmanager.go:420-421
  ... alertmanagerFromFallbackConfig -> ServeHTTP
--- FAIL: TestMultitenantAlertmanager_ServeHTTPWithFallbackConfig
    testing.go: race detected during execution of test

This may have been exposed/aggravated by the Alertmanager v0.32.1 upgrade (#7462).

Filed from CI failure-log analysis with AI assistance; the run link and the cited alertmanager.go lines were reviewed and verified against master before submitting.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions