Skip to content

Fix compactor concurrency limit not enforced across compaction levels#7303

Open
adamalexandru4 wants to merge 2 commits intocortexproject:masterfrom
adamalexandru4:fix/compactor-concurrency-limit
Open

Fix compactor concurrency limit not enforced across compaction levels#7303
adamalexandru4 wants to merge 2 commits intocortexproject:masterfrom
adamalexandru4:fix/compactor-concurrency-limit

Conversation

@adamalexandru4
Copy link

What this PR does

BucketCompactor.Compact() has a for loop that repeatedly calls grouper.Groups() after each successful compaction (shouldRerun=true). The grouper returns up to compactionConcurrency groups per call, but since the loop calls it multiple times, the effective limit is unbounded.

With concurrency=1 and blocks [6×2h, 2×12h]:

  • Iteration 1: Groups() → 1 group (6×2h → 12h), compacts, shouldRerun=true
  • Iteration 2: Groups() → 1 group (3×12h → 24h), compacts, shouldRerun=true
  • Iteration 3: Groups() → 0 groups, loop breaks

Result: 2 compactions in one pass despite concurrency=1. Each downloads blocks to local disk, causing unexpected disk usage spikes.

Fix

Track cumulative groups returned across Groups() calls within one Compact() invocation. Once the total reaches compactionConcurrency, return empty:

func (g *ShuffleShardingGrouper) Groups(blocks map[ulid.ULID]*metadata.Meta) ([]*compact.Group, error) {
    remainingConcurrency := g.compactionConcurrency - g.totalGroupsPlanned
    if remainingConcurrency <= 0 {
        return nil, nil
    }
    // ... existing logic, using remainingConcurrency as the limit ...
    g.totalGroupsPlanned += len(outGroups)
    return outGroups, nil
}

This is safe because the grouper is created fresh in each compactUser() call, so totalGroupsPlanned resets every pass.

The fix currently covers:

  • ShuffleShardingGrouper (sharding_strategy: shuffle-sharding)
  • PartitionCompactionGrouper (sharding_strategy: shuffle-sharding + compaction_strategy: partitioning)

Gap: The DefaultBlocksGrouperFactory path (used when sharding_strategy: default or sharding is disabled) delegates to the Thanos DefaultGrouper, which is not patched. This affects single-tenant / non-sharded setups. Plan is to wrap it in a concurrency-limiting adapter.

Tests

  • Unit tests for both ShuffleShardingGrouper and PartitionCompactionGrouper: call Groups() twice with concurrency=1, assert second call returns 0 groups
  • Integration test reproducing the exact bug scenario (6×2h + 2×12h blocks)

Which issue(s) this PR fixes:
Fixes [#7298]

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Alexandru Adam <aadam@adobe.com>
@adamalexandru4 adamalexandru4 force-pushed the fix/compactor-concurrency-limit branch from 72a39f1 to 352b8db Compare February 28, 2026 10:06
Signed-off-by: Alexandru Adam <45754470+adamalexandru4@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant