Add a separate requeue circuit breaker to handle worker corruption

PR #378 made the right call decoupling requeues from the failure circuit breaker, but it acknowledged a tradeoff:

> workers which are legitimately in a corrupted state will requeue more tests than before

In practice this plays out badly: a worker with corrupted runtime state (global variable pollution, database connection corruption, etc.) will keep running, reserving tests, and requeueing them because it can't actually complete them. With requeues invisible to `max_consecutive_failures`, nothing kills that worker and it can requeue tests by the hundreds.

### Proposed fix

A second, independent circuit breaker that counts consecutive requeues per worker, not final failures. Something like `max_consecutive_requeues` (config, separate from `max_consecutive_failures`). When a worker crosses the threshold, it gets fenced off the same way the existing breaker works today.

This keeps the two behaviors cleanly decoupled:
- `max_consecutive_failures` → "these tests are actually broken"
- `max_consecutive_requeues` → "this worker is broken"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a separate requeue circuit breaker to handle worker corruption #406

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add a separate requeue circuit breaker to handle worker corruption #406

Description

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions