Skip to content

Integrate cargo-mutants (diff-only mode) into CI #1053

@zazabap

Description

@zazabap

Summary

Add cargo-mutants to CI in diff-only mode (--in-diff) to catch undertested code in PRs. Full mutation testing is infeasible (~25,800 mutants × ~2 min each ≈ 750 hours), but diff-scoped runs are practical.

Why

We enforce >95% line coverage via codecov, but line coverage doesn't measure test quality — a test that runs code without asserting on its output achieves coverage without catching bugs. Mutation testing fills this gap: if replacing + with - in a function doesn't fail any test, the tests are inadequate.

Proposal

PR CI job (diff-only)

Add a GitHub Actions job that runs mutations only on lines changed in the PR:

- name: Mutation testing (changed lines only)
  run: |
    cargo mutants \
      --in-diff <(git diff origin/main) \
      --timeout-multiplier 3 \
      -j 4 \
      -- --features ilp-highs

A typical PR touching 5-10 files generates 50-200 mutants. With 4 parallel jobs and ~2 min per mutant, this is 25-100 minutes — further reducible with sharding across runners.

Configuration (cargo-mutants.toml)

timeout_multiplier = 3.0
minimum_test_timeout = 60

# Exclude generated/non-logic code from mutation
exclude_re = ["fn fmt\\(", "fn default\\(", "fn clone\\("]
exclude_globs = ["examples/**", "problemreductions-macros/**"]

additional_args = ["--features", "ilp-highs"]
jobs = 4

Rollout

  1. Phase 1 — Informational: Add the CI job as non-blocking (continue-on-error). Upload mutants.out/ as an artifact. Review surviving mutants manually to calibrate expectations.
  2. Phase 2 — Enforcing: Once the baseline is clean, make the job blocking. Surviving mutants in changed code fail the PR.

Optional: sharding for large PRs

strategy:
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: cargo mutants --in-diff <(git diff origin/main) --shard ${{ matrix.shard }}/4

Numbers

Metric Value
Total mutants (full project) 25,809
src/rules/ 15,638
src/models/ 8,318
src/solvers/ 237
Baseline test time ~1m43s
Full run estimate ~750 hours (infeasible)
Typical PR diff run 50-200 mutants → 25-100 min

Out of scope

  • Full/scheduled mutation testing runs (infeasible at current project size)
  • Mutation-based coverage metrics in codecov

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions