Skip to content

1/5 build: measure coverage on source only via coverage run#120

Open
vokracko wants to merge 1 commit intojerry-git:masterfrom
vokracko:refactor/source-only-coverage
Open

1/5 build: measure coverage on source only via coverage run#120
vokracko wants to merge 1 commit intojerry-git:masterfrom
vokracko:refactor/source-only-coverage

Conversation

@vokracko
Copy link
Copy Markdown

@vokracko vokracko commented Apr 29, 2026

First of five stacked PRs that fix #25pytest-split group memberships drifting across pytest collection orders, e.g. when parametrise is fed a hash-randomised iterable (set, frozenset, dict.keys()) under PYTHONHASHSEED=random. The stack churns committed .test_durations files and CI group manifests with no real code change, and 0.8.0's earlier fix only covered least_duration; this stack extends the same guarantee to duration_based_chunks and tightens a few related concerns along the way.

Motivation. Subsequent PRs reshape the test suite — consolidating per-attribute assertions, adjusting test counts when algorithm output changes. The current coverage setup makes those changes look like coverage regressions even though source coverage is unchanged. Fix the measurement first so later PRs get clean diffs.

The pytest-cov limitation. pytest_split is itself a pytest plugin. When pytest starts up, it loads installed plugins via entry points. So pytest does roughly:

  1. pytest starts
  2. pytest discovers plugins, including pytest_split and pytest-cov
  3. pytest imports pytest_split (executes class bodies, decorators, enum definitions — anything at module top level)
  4. pytest-cov finishes initialising and starts tracing
  5. Tests run

Class definitions like TestGroup, AlgorithmBase, LeastDurationAlgorithm and the Algorithms enum all execute in step 3 — before coverage starts tracing in step 4. Coverage records them as "uncovered" even though they obviously ran (the plugin wouldn't work otherwise). That's what the warning is telling you:

CoverageWarning: Module pytest_split was previously imported, but not measured

Translation: "I noticed pytest_split was already loaded by the time I started watching, so I can't tell you what ran during its import."

The reported missing lines for algorithms.py were exactly: import statements, the TestGroup NamedTuple, AlgorithmBase class body, LeastDurationAlgorithm class def, DurationBasedChunksAlgorithm class def, and the Algorithms enum. All run-at-import code. None actually untested.

The cookiecutter workaround. The project was generated from wolt-python-package-cookiecutter, which added --cov tests to the addopts. Test files are 100% covered by definition, so this inflated the denominator and pulled the combined average up to ~90%. The threshold of 90 was set against that inflated number — the actual source coverage was always ~74%.

What this PR does. Switch from pytest --cov pytest_split to coverage run -m pytest + coverage report. Running coverage from the outside means tracing starts before Python imports anything, so plugin import-time code is counted. Real source coverage jumps from ~74% to 99% (the only missing lines now are an abstract method's pass and one untested branch in ipynb_compatibility.py).

  • Drop --cov pytest_split --cov tests --cov-report term-missing --no-cov-on-fail from addopts.
  • Add [tool.coverage.run] source = ["pytest_split"] so coverage report knows what to measure.
  • Update CI to run coverage run -m pytest then coverage report.
  • Set fail_under to 95 (4-point buffer below 99%).

Stack. Each PR is independent enough to review on its own; merge in order.

The previous setup relied on pytest-cov (``--cov pytest_split
--cov tests`` in addopts) which has a long-standing limitation when
measuring a pytest plugin: the plugin imports happen during pytest's
plugin discovery, *before* pytest-cov starts tracing. Anything that
runs at import time (class bodies, decorators, type aliases, enum
definitions) is reported as uncovered even though it obviously
executed - the plugin wouldn't load otherwise. The
``CoverageWarning: Module pytest_split was previously imported, but
not measured`` made that explicit.

The cookiecutter template the project was generated from worked
around this by adding ``--cov tests`` to the addopts, inflating the
denominator with 100%-covered test files so the average reached 90%.
That pulled the number up but stopped reporting the actual source
coverage.

Switch to ``coverage run -m pytest`` followed by ``coverage report``.
Running ``coverage`` from the outside means tracing starts before
Python imports anything at all, so plugin import-time code is
counted. Real source coverage jumps from ~74% to 99%; threshold goes
from 90 to 95 to reflect that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Splits invalid when collection order not deterministic

1 participant