1/5 build: measure coverage on source only via coverage run by vokracko · Pull Request #120 · jerry-git/pytest-split

vokracko · 2026-04-29T15:00:16Z

First of five stacked PRs that fix #25 — pytest-split group memberships drifting across pytest collection orders, e.g. when parametrise is fed a hash-randomised iterable (set, frozenset, dict.keys()) under PYTHONHASHSEED=random. The stack churns committed .test_durations files and CI group manifests with no real code change, and 0.8.0's earlier fix only covered least_duration; this stack extends the same guarantee to duration_based_chunks and tightens a few related concerns along the way.

Motivation. Subsequent PRs reshape the test suite — consolidating per-attribute assertions, adjusting test counts when algorithm output changes. The current coverage setup makes those changes look like coverage regressions even though source coverage is unchanged. Fix the measurement first so later PRs get clean diffs.

The pytest-cov limitation. pytest_split is itself a pytest plugin. When pytest starts up, it loads installed plugins via entry points. So pytest does roughly:

pytest starts
pytest discovers plugins, including pytest_split and pytest-cov
pytest imports pytest_split (executes class bodies, decorators, enum definitions — anything at module top level)
pytest-cov finishes initialising and starts tracing
Tests run

Class definitions like TestGroup, AlgorithmBase, LeastDurationAlgorithm and the Algorithms enum all execute in step 3 — before coverage starts tracing in step 4. Coverage records them as "uncovered" even though they obviously ran (the plugin wouldn't work otherwise). That's what the warning is telling you:

CoverageWarning: Module pytest_split was previously imported, but not measured

Translation: "I noticed pytest_split was already loaded by the time I started watching, so I can't tell you what ran during its import."

The reported missing lines for algorithms.py were exactly: import statements, the TestGroup NamedTuple, AlgorithmBase class body, LeastDurationAlgorithm class def, DurationBasedChunksAlgorithm class def, and the Algorithms enum. All run-at-import code. None actually untested.

The cookiecutter workaround. The project was generated from wolt-python-package-cookiecutter, which added --cov tests to the addopts. Test files are 100% covered by definition, so this inflated the denominator and pulled the combined average up to ~90%. The threshold of 90 was set against that inflated number — the actual source coverage was always ~74%.

What this PR does. Switch from pytest --cov pytest_split to coverage run -m pytest + coverage report. Running coverage from the outside means tracing starts before Python imports anything, so plugin import-time code is counted. Real source coverage jumps from ~74% to 99% (the only missing lines now are an abstract method's pass and one untested branch in ipynb_compatibility.py).

Drop --cov pytest_split --cov tests --cov-report term-missing --no-cov-on-fail from addopts.
Add [tool.coverage.run] source = ["pytest_split"] so coverage report knows what to measure.
Update CI to run coverage run -m pytest then coverage report.
Set fail_under to 95 (4-point buffer below 99%).

Stack. Each PR is independent enough to review on its own; merge in order.

1/5 — this PR — measure coverage on source only via coverage run
2/5 — assert TestGroup directly → 2/5 refactor: assert TestGroup directly in algorithm tests #121
3/5 — durations dict signature → 3/5 refactor: pass durations dict into algorithms #122
4/5 — extract within-group ordering → 4/5 refactor: extract within-group ordering out of algorithms #123
5/5 — stabilise duration_based_chunks — closes Splits invalid when collection order not deterministic #25 → 5/5 fix: stabilise duration_based_chunks across collection orders #124

The previous setup relied on pytest-cov (``--cov pytest_split --cov tests`` in addopts) which has a long-standing limitation when measuring a pytest plugin: the plugin imports happen during pytest's plugin discovery, *before* pytest-cov starts tracing. Anything that runs at import time (class bodies, decorators, type aliases, enum definitions) is reported as uncovered even though it obviously executed - the plugin wouldn't load otherwise. The ``CoverageWarning: Module pytest_split was previously imported, but not measured`` made that explicit. The cookiecutter template the project was generated from worked around this by adding ``--cov tests`` to the addopts, inflating the denominator with 100%-covered test files so the average reached 90%. That pulled the number up but stopped reporting the actual source coverage. Switch to ``coverage run -m pytest`` followed by ``coverage report``. Running ``coverage`` from the outside means tracing starts before Python imports anything at all, so plugin import-time code is counted. Real source coverage jumps from ~74% to 99%; threshold goes from 90 to 95 to reflect that.

vokracko requested a review from jerry-git as a code owner April 29, 2026 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1/5 build: measure coverage on source only via coverage run#120

1/5 build: measure coverage on source only via coverage run#120
vokracko wants to merge 1 commit intojerry-git:masterfrom
vokracko:refactor/source-only-coverage

vokracko commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vokracko commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vokracko commented Apr 29, 2026 •

edited

Loading