Skip to content

Bundle LLM guide files in wheel with get_llm_guide() accessor#310

Merged
igerber merged 7 commits intomainfrom
llm-guide-api
Apr 17, 2026
Merged

Bundle LLM guide files in wheel with get_llm_guide() accessor#310
igerber merged 7 commits intomainfrom
llm-guide-api

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 16, 2026

Summary

  • Move docs/llms*.txt into diff_diff/guides/ so the three LLM guide files ship inside the wheel and are discoverable with no network access.
  • Add diff_diff.get_llm_guide(variant) accessor using importlib.resources; variants: "concise" (default), "full", "practitioner".
  • Surface the accessor in help(diff_diff) via the module docstring so AI agents find it from a pip install alone.
  • Explicit [tool.maturin] include entry as defense-in-depth against auto-inclusion behavior drift across maturin 1.4-2.0.
  • Internal cross-references inside guide files rewritten to point at diff_diff.get_llm_guide("practitioner") so the air-gap breadcrumb works end-to-end.
  • pyproject.toml "Practitioner Guide" URL switched to ReadTheDocs (newly publishes llms-practitioner.txt at /en/stable/llms-practitioner.txt).
  • Path references updated in README.md, CLAUDE.md, docs/doc-deps.yaml, docs/methodology/REGISTRY.md, and .claude/commands/bump-version.md.

Methodology references (required if estimator / math changes)

  • N/A — no estimator, inference, SE, or identification logic changed. REGISTRY.md update is a path reference only (points at the moved practitioner guide).

Validation

  • Tests added/updated: tests/test_guides.py — 18 tests (variant loading, default=concise, full>concise>practitioner length, content-stability assertions, UTF-8 encoding roundtrip, package-resource equivalence, error path for 7 invalid variants, namespace export, docstring hint).
  • pytest tests/test_guides.py -v → 18/18 passing locally.
  • Manual verification: python -c "import diff_diff; help(diff_diff)" shows the AI-agent hint; get_llm_guide(), get_llm_guide("full"), get_llm_guide("practitioner") return 13166 / 56793 / 19807 chars respectively.
  • Deferred to pre-merge / CI: local Sphinx build to confirm parent-relative html_extra_path resolves; wheel-install smoke verifying .txt files are inside the built artifact; live check that the new RTD URL resolves after docs deploy.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

igerber and others added 2 commits April 16, 2026 18:22
Move docs/llms*.txt into diff_diff/guides/ so they ship inside the wheel
and are discoverable via `diff_diff.get_llm_guide()` with no network access.
Surface the helper in the module docstring so help(diff_diff) guides AI
agents to it.

- Move llms.txt, llms-full.txt, llms-practitioner.txt to diff_diff/guides/
- Add get_llm_guide(variant) using importlib.resources
- Explicit [tool.maturin] include entry for defense-in-depth across 1.4-2.0
- Switch pyproject "Practitioner Guide" URL to ReadTheDocs
- Rewrite internal cross-refs inside guide files to point at the accessor
- Update docs/conf.py html_extra_path to the new location (adds
  llms-practitioner.txt to the RTD surface)
- Update all path references (doc-deps.yaml, README, CLAUDE, REGISTRY,
  bump-version)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarifies that "CONCISE", " concise", etc. raise ValueError so callers
don't have to discover this by experimentation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

This PR is packaging/docs/API-surface work only. No estimator math, weighting, variance/SE, identification checks, or default behaviors changed. Execution-based validation was not possible in this environment because pytest and required runtime deps such as numpy are unavailable.

Executive Summary

  • No methodology-sensitive code changed; the only methodology-adjacent edit is a path update in docs/methodology/REGISTRY.md:L2770.
  • get_llm_guide() is a small whitelist-based resource accessor, so I did not find new correctness, security, or performance risks in the implementation.
  • P3: the bundled practitioner guide still points to a repo-only docs/methodology/REGISTRY.md, which breaks the offline breadcrumb for wheel-only installs.
  • P3-informational: build-surface validation is still thin; the added tests exercise source-tree resources, and related .txt guide CI-validation debt is already tracked in TODO.md:L95.

Methodology

No findings. Affected method(s): none. The REGISTRY change at docs/methodology/REGISTRY.md:L2770 is path-only and preserves the existing Baker et al. adaptation note.

Code Quality

No findings.

Performance

No findings. The new code path is an explicit on-demand text read, not estimator-path logic.

Maintainability

No findings.

Tech Debt

  • P3-informational Impact: The feature depends on the distribution surfaces configured in pyproject.toml:L83-L97 and docs/conf.py:L74-L77, but the added tests in tests/test_guides.py:L35-L42 only validate source-tree resources. Related AI-guide CI debt is already tracked in TODO.md:L95. Concrete fix: when CI is next touched, add a wheel-install smoke test plus a docs-build assertion that the three llms*.txt files are published at the expected output paths.

Security

No findings. get_llm_guide() uses a fixed variant whitelist instead of interpolating user input into filesystem paths, so it does not introduce an arbitrary file-read/path-traversal surface.

Documentation/Tests

  • P3 Impact: The shipped practitioner guide still tells readers to see docs/methodology/REGISTRY.md in diff_diff/guides/llms-practitioner.txt:L6. That path will not exist in a wheel-only/offline install, leaving a dead local breadcrumb inside the artifact this PR is trying to make self-contained. Concrete fix: replace that repo-only reference with an inline summary of the relevant deviation note, or point to another bundled/offline surface instead.

The guide's opening blockquote previously pointed at
`docs/methodology/REGISTRY.md` for details on reorganizations relative to
Baker et al. (2025). That path does not exist inside a wheel-only install,
so it was a dead breadcrumb for offline AI agents - exactly the scenario
this PR is meant to fix.

Inline the four deviation bullets (numbering, PT as Step 3, uncertainty
folding into Step 5, Step 8 = Robustness & Reporting) directly into the
guide so it is self-contained.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 16, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e65be1665c2d27c2f421619fe1cdcc1164a63cd2


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the earlier dead offline-breadcrumb issue. The only remaining item I found is a tracked P3-informational validation gap around built artifacts and docs publication.

Executive Summary

  • No estimator, weighting, variance/SE, identification-assumption, or default-behavior code paths changed.
  • Methodology cross-check passed: the practitioner-guide deviations now inlined in diff_diff/guides/llms-practitioner.txt:L8-L20 match the documented **Note:** entries in docs/methodology/REGISTRY.md:L2774-L2785.
  • The prior P3 about the bundled practitioner guide pointing wheel users at a repo-only path is resolved.
  • Residual P3-informational: guide tests still stop at source-tree resource reads; wheel-install and docs-publication validation remain deferred, but that debt is already tracked in TODO.md:L95-L96.
  • I could not execute pytest or import diff_diff in this environment because numpy is not installed.

Methodology

Affected method(s): none.

Severity: P3-informational
Impact: The only methodology-adjacent change is the bundled practitioner guide text. The inlined Baker et al. adaptation details in diff_diff/guides/llms-practitioner.txt:L8-L20 mirror the documented deviations in docs/methodology/REGISTRY.md:L2774-L2785, so under the review rubric this is a documented deviation, not an undocumented methodology change.
Concrete fix: None.

Code Quality

No findings.

Performance

No findings. The new guide read in diff_diff/_guides_api.py:L13-L48 is on-demand and outside estimator/inference hot paths.

Maintainability

No findings.

Tech Debt

Severity: P3-informational
Impact: The new “wheel content” check in tests/test_guides.py:L35-L42 still compares source-tree package resources rather than a built wheel, and the new public docs surface depends on docs/conf.py:L74-L78 and pyproject.toml:L83-L97. This validation gap is already tracked in TODO.md:L95-L96, so it is not blocking.
Concrete fix: Keep the tracked follow-up to add wheel-install and docs-build smoke checks.

Security

No findings. The new accessor uses a fixed variant whitelist instead of interpolating user input into filesystem paths in diff_diff/_guides_api.py:L6-L48.

Documentation/Tests

No new untracked findings. The added coverage in tests/test_guides.py:L10-L67 is reasonable for variant loading, invalid inputs, namespace export, and docstring surfacing, and the earlier offline-breadcrumb issue is fixed by diff_diff/guides/llms-practitioner.txt:L3-L20. Verification note: I could not run the tests here because importing diff_diff fails without numpy in this review environment.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 16, 2026
maturin 1.13.1's GlobPattern enum only accepts `sdist` or `wheel` for
the `format` field, not `all`. That rejected pyproject.toml at line 97
and broke the build step in every Python Tests job on the PR.

Use the bare-string form `include = ["diff_diff/guides/*.txt"]`, which
includes the glob in both sdist and wheel by default. Verified locally
with `maturin sdist`: all three .txt files land in the sdist tarball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 16, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b5f063a6d8a141b43a930100701d6f1d209f8ab8


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review stays clear of estimator, weighting, variance/SE, identification-assumption, and default-behavior changes. The only issues I found are non-blocking P3 items around docs/dependency tracking and release validation.

Executive Summary

Methodology

  • Severity: P3-informational
  • Impact: Affected method(s): none. The only methodology-adjacent change is the practitioner-guide framing text, and its deviations from Baker et al. are explicitly documented in both diff_diff/guides/llms-practitioner.txt:L3 and docs/methodology/REGISTRY.md:L2770. Under the review rubric, that is a documented deviation, not a defect.
  • Concrete fix: None.

Code Quality

No findings.

Performance

No findings. get_llm_guide() is an on-demand text read outside estimator/inference hot paths in diff_diff/_guides_api.py:L13.

Maintainability

  • Severity: P3
  • Impact: The PR updates many moved-guide references in docs/doc-deps.yaml:L612 and docs/doc-deps.yaml:L628, but it does not add an entry for the new public helper module diff_diff/_guides_api.py. That means docs-impact tooling can miss future documentation drift when the accessor changes.
  • Concrete fix: Add a diff_diff/_guides_api.py entry to docs/doc-deps.yaml mapping it to the bundled guide docs and/or the public API surface, or explicitly document why it is intentionally excluded.

Tech Debt

  • Severity: P3-informational
  • Impact: test_wheel_content_matches_package_resource() in tests/test_guides.py:L35 only compares the accessor against source-tree package resources; it does not prove that the wheel actually contains the files or that the new ReadTheDocs root-path publication configured in docs/conf.py:L74 and linked from pyproject.toml:L83 is present after build/deploy. This is non-blocking and compatible with ✅, especially given the already-tracked docs/CI validation debt in TODO.md:L95 and TODO.md:L96.
  • Concrete fix: Add one wheel-install smoke test that imports diff_diff from the built artifact and calls get_llm_guide(), plus one docs-build smoke test that asserts llms-practitioner.txt is emitted at the expected site-root path.

Security

No findings. The accessor dispatches through a fixed whitelist rather than interpolating user input into filesystem paths in diff_diff/_guides_api.py:L6.

Documentation/Tests

No new untracked findings. Coverage in tests/test_guides.py:L10 is reasonable for variant loading, invalid inputs, namespace export, and docstring surfacing.

Verification note: I could not run pytest tests/test_guides.py or import diff_diff in this environment because numpy, pandas, scipy, and pytest are not installed.

igerber and others added 2 commits April 16, 2026 20:47
…ides

PR #309 added four validation diagnostics to SyntheticDiDResults:
`get_weight_concentration()`, `get_loo_effects_df()`, `in_time_placebo()`,
and `sensitivity_to_zeta_omega()`. Because this PR is the first place the
guides ship inside the wheel, we want them faithful to main's API at the
moment of merge.

- `llms-full.txt`: add the four methods to the SyntheticDiDResults
  Methods line and a short "Validation diagnostics" subsection describing
  each.
- `llms-practitioner.txt`: split the former `SyntheticDiD/TROP` bullet so
  SyntheticDiD now points at the built-in helpers (with the jackknife
  caveat for LOO); TROP keeps the generic guidance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 70b547bd8e3f905c9deb0b296778edd14dd1930a


Overall Assessment

✅ Looks good

Executive Summary

Methodology

  • Severity: P3-informational. Impact: Affected methods: none in estimator/inference code. The changed practitioner-guide text preserves the already-documented deviations from Baker et al., and those deviations remain mirrored in the Methodology Registry at docs/methodology/REGISTRY.md:L2782-L2797. Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

  • Severity: P3-informational. Impact: diff_diff/_guides_api.py:L1-L48 introduces a new public, doc-bearing source file, but docs/doc-deps.yaml:L616-L637 still has no entry for it. Future accessor/docstring changes can drift without doc-deps tooling noticing. This is already tracked generically in TODO.md:L96-L96.
    Concrete fix: Add a diff_diff/_guides_api.py block to docs/doc-deps.yaml mapping it to the bundled guide docs and/or public API documentation.

Tech Debt

  • Severity: P3-informational. Impact: tests/test_guides.py:L35-L42 proves get_llm_guide() matches package resources in the source tree, but it does not verify the built wheel or the Sphinx extra-path publication configured in pyproject.toml:L85-L98 and docs/conf.py:L74-L78. That leaves the README claim in README.md:L72-L82 validated only indirectly. This is compatible with the tracked .txt-guide CI gap in TODO.md:L95-L96.
    Concrete fix: Add one CI smoke test that installs the built wheel and calls get_llm_guide(), plus one docs-build assertion that llms-practitioner.txt is emitted at site root.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: diff_diff/guides/llms-full.txt:L1032-L1034 describes get_loo_effects_df() as leave-one-out influence “per treated unit” and implies a treated-side support condition, but the method actually returns both control and treated LOO rows and also requires more than one control with nonzero/effective weight to be available, per diff_diff/results.py:L1007-L1049 and diff_diff/synthetic_did.py:L1251-L1281. This is a documentation-accuracy issue only, not a methodology defect.
    Concrete fix: Reword the bullet to “per-unit leave-one-out influence” and either list the full availability conditions or point readers to the method docstring/error path.
  • Verification note: I could not rerun pytest tests/test_guides.py or import diff_diff end-to-end in this environment because pytest and numpy are not installed.

AI review flagged that the bullet said "per treated unit" and implied a
treated-side-only support condition, but the method actually returns a
DataFrame with both control and treated rows and has a broader set of
availability conditions than a single clause captures.

- Change "per treated unit" to "per-unit (both control and treated rows)"
- Defer the full availability conditions to the method docstring, with
  examples of the two most common failure modes (single treated unit,
  only one control with nonzero effective weight)

Code behavior unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 89ee3378a467c6450b840b662a6154d4fd32e7b2


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: still no unmitigated P0/P1 issues. This PR does not change estimator logic, weighting, variance/SE computation, identification assumptions, or methodology-bearing defaults.
  • The methodology-adjacent practitioner-guide edits in diff_diff/guides/llms-practitioner.txt:L3-L20 match the documented Notes in docs/methodology/REGISTRY.md:L2782-L2797, so there is no undocumented deviation from Baker et al.
  • The prior P3 documentation issue around get_loo_effects_df() appears resolved: diff_diff/guides/llms-full.txt:L1032-L1035 now matches diff_diff/results.py:L1007-L1038.
  • Remaining items are P3 only: _guides_api.py is still unmapped in docs/doc-deps.yaml, artifact-level wheel/docs publication is still not exercised by tests, and the SyntheticDiDResults table in llms-full.txt still omits "jackknife" from variance_method.
  • I could not rerun pytest or import diff_diff end-to-end here because this environment does not have pytest or numpy.

Methodology

No findings. Affected methods: none. The only methodology-adjacent change is guide prose, and its explicit deviations from Baker et al. remain documented in both diff_diff/guides/llms-practitioner.txt:L3-L20 and docs/methodology/REGISTRY.md:L2782-L2797.

Code Quality

No findings.

Performance

No findings.

Maintainability

  • Severity: P3-informational. Impact: diff_diff/_guides_api.py:L1-L48 introduces a new public, doc-bearing source file, but the infrastructure block in docs/doc-deps.yaml:L616-L637 still has no entry for it. Changes to the accessor or its docstring can drift without the doc-deps map noticing. This limitation is already tracked generically in TODO.md:L96-L96. Concrete fix: add a diff_diff/_guides_api.py block mapping the accessor to the bundled guides and any public API docs that surface get_llm_guide().

Tech Debt

  • Severity: P3-informational. Impact: pyproject.toml:L83-L98 and docs/conf.py:L74-L77 now advertise wheel-bundled guides and RTD-published text artifacts, but tests/test_guides.py:L35-L67 only exercises source-tree package resources; it does not validate a built wheel or built docs output. Packaging/publishing regressions would therefore slip past CI. This is compatible with the existing tracked guide-validation debt in TODO.md:L95-L95. Concrete fix: add one wheel-install smoke test that imports diff_diff from the built artifact and one docs-build assertion that llms-practitioner.txt is emitted at the site root.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: In the edited SyntheticDiDResults section, diff_diff/guides/llms-full.txt:L1024-L1024 still describes variance_method as "bootstrap" or "placebo" only, even though the same section now documents a jackknife-only diagnostic at diff_diff/guides/llms-full.txt:L1034-L1034 and the implementation supports "bootstrap", "jackknife", and "placebo" in diff_diff/synthetic_did.py:L50-L64 and diff_diff/results.py:L696-L724. This is a minor documentation inconsistency only. Concrete fix: update the attribute row to include "jackknife" or point readers to the estimator/results docstring.
  • Verification note: I could not execute pytest tests/test_guides.py or import diff_diff locally in this environment because pytest and numpy are not installed.

@igerber igerber merged commit bd2b55d into main Apr 17, 2026
19 checks passed
@igerber igerber deleted the llm-guide-api branch April 17, 2026 10:01
igerber added a commit that referenced this pull request Apr 17, 2026
- diff_diff/guides/llms-practitioner.txt: add reversible-treatment branch
  to the Step 4 decision tree pointing at ChaisemartinDHaultfoeuille
  (alias DCDH), and a short Step 5 example showing the
  survey_design + L_max + TSL fallback for dCDH. The practitioner guide
  previously had zero dCDH coverage despite the full API being live.
- diff_diff/guides/llms-full.txt: fix stale SyntheticDiDResults table
  (line 1024) — variance_method now accepts "bootstrap", "jackknife",
  or "placebo" (jackknife was added for synthdid::vcov parity but the
  table only listed bootstrap/placebo).

Bundled-guide paths only. The old docs/llms-*.txt were moved to
diff_diff/guides/ in main (PR #310); these edits target the new
canonical location consumed by get_llm_guide().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant