A contributor's guide to how CoreGraph is tested and how to run each layer
locally. There are three layers: Rust cargo test (unit + integration),
shell-driven end-to-end golden scenarios, and a real-repo tier 2
robustness smoke test. Benchmarks run separately under criterion.
# 1. Rust unit + integration tests across the workspace.
cargo test --workspace
# 2. Build the binary the shell tests need, then run the golden scenarios.
cargo build --workspace
bash tests/e2e/golden/run.sh
# 3. Real-repo smoke test (clones pinned open-source projects).
cargo build --release # tier 2 prefers the release binary
bash tests/e2e/tier2/run.sh
# 4. Benchmarks (optional, never gates a PR).
cargo bench --workspaceThe golden and tier 2 runners are plain Bash. They require the coregraph
binary to be built first and use jq for JSON assertions — install jq if you
do not have it.
| Layer | Lives in | Run with | What it checks |
|---|---|---|---|
| Unit + integration | crates/*/src/ and crates/*/tests/ |
cargo test --workspace |
Per-crate logic, parsers, confidence math, daemon lifecycle |
| E2E golden | tests/e2e/golden/ |
bash tests/e2e/golden/run.sh |
Exact CLI behavior on small hand-built fixtures (precision) |
| E2E tier 2 | tests/e2e/tier2/ |
bash tests/e2e/tier2/run.sh |
Binary survives large real repos without collapse |
| Benchmarks | crates/extractor/benches/, crates/query/benches/ |
cargo bench --workspace |
Indexing / impact timing under criterion |
Standard Rust tests: #[test] and #[tokio::test] functions live inline in
crates/*/src/ modules and in dedicated files under crates/*/tests/. Tests
that need a workspace on disk create an isolated temp directory with tempfile.
cargo test --workspace # all crates
cargo test -p coregraph-graph # one crate
cargo test -p coregraph --test daemon_lifecycle -- --test-threads=1Integration-test files worth knowing:
| File | Covers |
|---|---|
crates/cli/tests/daemon_lifecycle.rs |
Daemon spawn / reuse / stop over the IPC socket (run single-threaded) |
crates/extractor/tests/cross_language_fixtures.rs |
Cross-language linking: ApiPathMatch edges connecting a Java controller and a TS client (against the cross-lang-matched fixture) |
crates/stack/tests/stack_graphs_backend.rs |
The stack-graphs cross-file name resolver |
crates/stack/tests/go_tsg.rs, rust_tsg.rs, kotlin_tsg.rs |
The hand-authored .tsg rules for Go, Rust, and Kotlin |
Snapshot serialization is covered by ordinary unit tests in
crates/graph/src/snapshot.rs: a node/edge round-trip
(snapshot_roundtrip_nodes_and_edges), an empty-graph round-trip
(empty_graph_snapshot), and a guard that loading a snapshot written under an
older schema version is rejected so the daemon falls back to a full rebuild
(load_rejects_wrong_schema_version). See schema-versioning.md for the schema
version (v6) and the forced-rebuild path.
The golden suite is the precision layer. Each scenario is a self-contained
directory under tests/e2e/golden/ with a small inline source fixture (typically
under src/, or split by package/language for the cross-package and multi-language
scenarios) and a scenario.sh that indexes it and asserts exact counts and symbol
names. run.sh executes every scenario in sequence, killing any leftover daemon
between them so cached state from one fixture cannot bleed into the next.
| Scenario | Exercises |
|---|---|
01-ts-single-package |
stats, query, export, impact, --output-format json on one TS package |
02-ts-cross-package |
Cross-package name resolution in a TS monorepo |
03-orphans |
orphans dead-code detection |
04-inconsistencies |
Cross-enum / api-path / config-key detection |
05-multi-lang |
Go + Rust + TypeScript in one project |
06-python |
The Python extractor |
07-edge-cases |
Barrels, circular imports, dynamic patterns, generics |
08-cli-protocols |
batch, snapshot save/load, watch, config show |
Shared shell helpers live in tests/e2e/lib/:
common.sh— locates the binary (defaults totarget/debug/coregraph), providesrun_cg <project_dir> <subcommand> …, and the assertion helpers (assert_eq,assert_range,assert_contains,assert_jq,print_summary).invariants.sh— structural invariants reused across scenarios.
A scenario asserts concrete facts, not approximate ones. For example,
01-ts-single-package checks that export --format json-graph yields a specific
node count and that --min-confidence 0.0 widens the edge set, and that named
symbols (UserService, getUser, …) are present while a known-absent symbol is
not. To run or debug a single scenario:
bash tests/e2e/golden/01-ts-single-package/scenario.shTier 2 is the robustness layer. It shallow-clones pinned open-source projects and
runs stats, query <well-known-symbol>, and orphans against each, asserting
only that the binary exits 0, produces non-zero output, stays within wide
symbol/edge ranges, and finishes inside a wall-clock limit. It is
collapse-prevention, not precision — a regression that wipes out edges (for
example a broken resolver) drops below min_edges and fails; the golden suite is
where exact correctness is enforced.
Projects are declared in tests/e2e/tier2/projects.toml, each pinned to a commit
SHA for reproducibility:
| Project | Why it is in the set |
|---|---|
zustand |
Small TypeScript library |
excalidraw |
Large TypeScript codebase |
spring-petclinic |
Java with both Maven and Gradle, plus application.yml config keys |
requests |
Python library |
cobra |
Go CLI library |
exposed |
Kotlin (exercises the hand-authored kotlin.tsg rules) |
Each [[project]] entry sets min_symbols / max_symbols, min_edges /
max_edges, max_wall_clock_sec, and a well_known_symbols list that must
resolve. The ranges are deliberately wide — bump the SHA and the ranges together
when you intentionally change extraction. Clones are cached under
tests/e2e/tier2/cache/, and each run writes a JSON snapshot to
tests/e2e/tier2/snapshots/<project>-<sha>.json for regression tracking. Tier 2
prefers the release binary and warns (then runs slowly) if only a debug build is
present.
Benchmarks use criterion and live in crates/extractor/benches/build_graph.rs
(graph construction) and crates/query/benches/impact.rs (impact traversal).
cargo bench --workspaceBenchmarks never gate a pull request. In CI, the ci.yml bench-smoke job only
compiles them (cargo bench --workspace --no-run) to catch breakage, and the
separate bench.yml workflow runs them and posts a comment if a result
regresses past its threshold — it does not fail the build.
The ci.yml workflow runs on every push to master/main and on pull requests,
across Linux, macOS, and Windows:
cargo fmt --all -- --checkcargo clippy --workspace --all-targets -- -D warningscargo build --workspacecargo test --workspaceon Unix. Windows runs the library crates plus the daemon module and thedaemon_lifecycleintegration test (single-threaded), because spawning a real daemon under privileged process control is Unix-only for now.
The shell-driven golden and tier 2 suites are not wired into CI — run them
locally with bash tests/e2e/golden/run.sh and bash tests/e2e/tier2/run.sh.
Only the Rust tests, a bench compile-smoke, and the VS Code extension build run
in ci.yml; the criterion benchmarks run in the separate bench.yml workflow.
- A new CLI behavior or a precision regression you want pinned → add a golden
scenario (or extend an existing one) with a minimal inline fixture and exact
assertions. Reuse the helpers in
tests/e2e/lib/common.sh. - Crate-internal logic → a
#[test]next to the code, or a file under that crate'stests/for integration-level coverage. Usetempfilefor any on-disk workspace so tests stay isolated. - A new language or extractor → add a fixture under
tests/fixtures/(see the existing per-language fixtures such asjava-spring,typescript-react,go,python-simple,kotlin-simple,rust-simple, andcross-lang-matched) and assert against it fromcrates/extractor/tests/.