Skip to content

Add HonestDiD integration for dCDH, summary() Phase 3 blocks#303

Merged
igerber merged 12 commits intomainfrom
feature/dcdh-honest-did-integration
Apr 14, 2026
Merged

Add HonestDiD integration for dCDH, summary() Phase 3 blocks#303
igerber merged 12 commits intomainfrom
feature/dcdh-honest-did-integration

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 14, 2026

Summary

  • Add HonestDiD (Rambachan-Roth 2023) integration on dCDH placebo + event study surface via honest_did=True in fit() and standalone compute_honest_did(results) path
  • Add 5 new summary() rendering sections for Phase 3 results: covariate diagnostics, cumulated level effects, heterogeneity test, design-2 descriptive, HonestDiD sensitivity bounds
  • Add trends_nonparam unequal-support regression test (Assumption 14 support-trimming)
  • Address AI review: runtime warning about placebo-based pre-periods, docstring update, summary refactor into helpers, edge-case test

Methodology references (required if estimator / math changes)

  • Method name(s): HonestDiD / Rambachan-Roth sensitivity analysis applied to ChaisemartinDHaultfoeuille (dCDH) estimator
  • Paper / source link(s): Rambachan, A. & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018
  • Any intentional deviations from the source (and why):
    • Library extension: dCDH HonestDiD uses DID^{pl}_l placebo estimates as pre-period coefficients rather than standard event-study pre-treatment coefficients. Runtime warning emitted. Documented in REGISTRY.md.
    • Diagonal covariance: No full cross-horizon VCV available for dCDH; uses diag(SE^2). Documented in REGISTRY.md.
    • Permissive consecutiveness: Gaps from trends_nonparam support-trimming are handled by filtering to largest consecutive block with warning (CS branch errors on gaps).

Validation

  • Tests added/updated:
    • tests/test_chaisemartin_dhaultfoeuille.py: +13 tests (7 HonestDiD integration, 5 summary rendering, 1 unequal-support)
    • tests/test_honest_did.py: +5 tests (integration, extraction, no-placebos error, placebo warning, empty-block edge case)
    • 232 total tests passing (178 dCDH + 15 parity + 40 HonestDiD), 0 failures
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 2 commits April 14, 2026 08:24
…regression test

- Add ChaisemartinDHaultfoeuilleResults extraction to _extract_event_study_params()
  in honest_did.py (maps placebo horizons to pre-periods, event study to post-periods)
- Lift honest_did gate in fit(), add early L_max>=1 validation, post-computation
  compute_honest_did() call with fallback warning on solver failures
- Add 5 new summary() sections: covariate diagnostics, cumulated level effects,
  heterogeneity test, design-2 descriptive, HonestDiD sensitivity bounds
- Update honest_did_results field type annotation and docstring
- 17 new tests: 7 HonestDiD integration, 5 summary rendering, 3 honest_did.py
  extraction/integration, 1 trends_nonparam unequal-support, 1 gate update
- REGISTRY.md: HonestDiD note, checklist update; ROADMAP: 3g shipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-case test

- Add UserWarning in dCDH HonestDiD extraction about placebo-based pre-periods
- Update REGISTRY.md to explicitly document library extension semantics
- Update fit() docstring for honest_did (was "Reserved for Phase 3")
- Include exception class name in HonestDiD failure warning
- Factor summary() Phase 3 blocks into 5 private helper methods
- Add test_dcdh_emits_placebo_warning and test_dcdh_empty_consecutive_block_raises

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment
⛔ Blocker — the PR’s dCDH HonestDiD integration introduces one silent methodology bug and one blocking parameter-interaction bug.

Executive Summary

  • The documented dCDH deviations themselves are fine: placebo-based pre-periods, diagonal VCV, and support-trimming are disclosed in the registry and should not block approval.
  • P0: the new “largest consecutive block” logic does not actually enforce presence of -1 / +1; if those horizons drop out after finite-SE filtering, HonestDiD can silently treat sparse horizons as boundary periods and return wrong bounds.
  • P1: fit(..., honest_did=True) silently no-ops when the estimator was constructed with placebo=False; the standalone compute_honest_did() path already errors on the same input.
  • The added tests miss both failure modes above.
  • Review is static only; I could not run the test suite in this sandbox because the available Python environment is missing project dependencies (numpy import fails).

Path to Approval

  1. In dCDH extraction, require a boundary-adjacent consecutive grid after finite-SE filtering (-1 must remain in pre, 1 must remain in post); otherwise raise a clear consecutive-grid error. Add regressions for “missing -1” and “missing +1”.
  2. Reject honest_did=True when the estimator was created with placebo=False (or always call the standalone path and surface its existing error/warning). Add a fit-level regression for that combination.

Methodology

  • P0 Impact: In diff_diff/honest_did.py:L884, _largest_consecutive_block() returns the full horizon list when the boundary value is missing. Downstream, the boundary smoothness and RM constructions in diff_diff/honest_did.py:L978 and diff_diff/honest_did.py:L1163 assume the retained coefficients are contiguous [-T,…,-1,1,…,T̄]. If finite-SE filtering leaves only [-2] or only [2,…], the code silently re-labels those as boundary periods and computes the wrong identified set / robust CI. Concrete fix: after filtering finite SEs, require a block that actually contains -1 and 1, or raise ValueError; do not implicitly renumber sparse horizons. This also means the new registry note in docs/methodology/REGISTRY.md:L620 is not implemented in the boundary-missing case. The reference HonestDiD implementation errors on gapped event-time grids instead of silently accepting them. (rdrr.io)
  • P1 Impact: diff_diff/chaisemartin_dhaultfoeuille.py:L954 validates honest_did only against L_max, and the actual integration at diff_diff/chaisemartin_dhaultfoeuille.py:L2412 is skipped whenever results.placebo_event_study is falsy. With ChaisemartinDHaultfoeuille(placebo=False), fit(..., honest_did=True, L_max>=1) silently returns with results.honest_did_results=None and no warning, even though the standalone path is expected to fail on missing placebos in tests/test_honest_did.py:L1393. Concrete fix: raise early when honest_did=True and not self.placebo, or call compute_honest_did() unconditionally and surface its existing error/warning consistently.
  • P3-informational Impact: The dCDH-specific methodology deviations themselves are properly documented in docs/methodology/REGISTRY.md:L620 and covered by runtime warnings/tests, so I am not treating those documented extensions as defects. Concrete fix: none.

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

  • No material findings beyond the blocker above.

Tech Debt

  • P3-informational Impact: Neither blocker is tracked under TODO.md:L51, so they are unmitigated and count fully toward the assessment. Concrete fix: these need code changes; TODO tracking would not downgrade them under the stated review policy.

Security

  • No findings.

Documentation/Tests

…ocstrings

P0: _largest_consecutive_block now raises ValueError when boundary horizon
    (-1 or +1) is missing after finite-SE filtering instead of silently
    returning the full list (would produce wrong HonestDiD bounds).
P1: honest_did=True now rejects placebo=False early instead of silently
    returning honest_did_results=None with no warning.
P2: Added 3 regression tests (boundary -1 missing, boundary +1 missing,
    placebo=False + honest_did).
P3: Updated docstrings in honest_did.py (6 locations) and docs/llms.txt
    to include ChaisemartinDHaultfoeuilleResults alongside MultiPeriodDiD/CS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ee6ad2f9f037c16aa7c36a74f9ac8118d7aff2a0


Overall Assessment
⚠️ Needs changes — the prior blocker findings are fixed, but there is one [Newly identified] unmitigated P1 on the new auto-HonestDiD default behavior.

Executive Summary

  • The prior P0/P1 issues are addressed: boundary-adjacent consecutiveness is now enforced in diff_diff/honest_did.py:L884-L915 with regressions in tests/test_honest_did.py:L1433-L1457, and honest_did=True now rejects placebo=False in diff_diff/chaisemartin_dhaultfoeuille.py:L954-L965 with fit-level coverage in tests/test_chaisemartin_dhaultfoeuille.py:L3278-L3285.
  • [Newly identified] fit(..., honest_did=True) hard-codes an implicit HonestDiD target parameter. In this codebase that means the equal-weight average over post horizons, but the new dCDH integration docs/registry/summary do not say that.
  • The explicit dCDH deviations called out in the PR body (placebo-based pre-periods, diagonal VCV, consecutive-block trimming) are documented in docs/methodology/REGISTRY.md:L620-L620 and are informational under the stated review policy.
  • The new tests materially improve coverage of the previous review, but they still do not pin the target parameter or summary label for the auto-HonestDiD block.
  • Static review only: pytest is not installed in this sandbox, so I could not execute the suite.

Path to Approval

  1. Decide what scalar estimand honest_did=True should target for dCDH: first post/on-impact effect, equal-weight post average, or the dCDH scalar summary.
  2. Encode that choice explicitly in code by passing l_vec from the fit-level integration (HonestDiD(..., l_vec=...), or extend compute_honest_did() to accept l_vec) instead of inheriting the current implicit default.
  3. Document the chosen target in docs/methodology/REGISTRY.md, update the fit() docstring and summary text to name it explicitly, and add a regression asserting results.honest_did_results.original_estimate equals that documented target.

Methodology

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

  • No material findings beyond the methodology issue above.

Tech Debt

  • No new tracked/mitigated tech-debt items change the assessment.

Security

  • No findings.

Documentation/Tests

  • P2 Impact: the new tests cover the prior boundary and placebo=False regressions, but they do not assert which scalar target honest_did=True is computing. The parity test in tests/test_chaisemartin_dhaultfoeuille.py:L3287-L3310 compares auto vs manual on the same implicit default, and the summary test in tests/test_chaisemartin_dhaultfoeuille.py:L3464-L3474 only checks section presence. Concrete fix: add a regression that checks results.honest_did_results.original_estimate against the documented l_vec target and that summary() prints that target clearly.

- Update fit() docstring to specify equal-weight average over post horizons
  (l_vec=None default) and note R's HonestDiD targets on-impact instead
- Update REGISTRY.md with l_vec deviation from R's default
- Add "Target: Equal-weight avg over post horizons" line to summary
- Add test_honest_did_original_estimate_is_post_average regression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ad7698cd11a0291363dea21a8e266fa502f1e509


Overall Assessment

⚠️ Needs changes — the prior HonestDiD target-documentation issue is fixed, but one P1 [Newly identified] incomplete-parameter-propagation issue remains in the new dCDH HonestDiD public API/docs path.

Executive Summary

  • The prior rer-review blocker on the auto-HonestDiD target is resolved: the PR now documents and tests that fit(..., honest_did=True) targets the equal-weight average over post-treatment horizons in diff_diff/chaisemartin_dhaultfoeuille.py:L540-L549, docs/methodology/REGISTRY.md:L620-L620, diff_diff/chaisemartin_dhaultfoeuille_results.py:L989-L991, and tests/test_chaisemartin_dhaultfoeuille.py:L3372-L3386.
  • P1 [Newly identified]: the new docs tell users to use compute_honest_did(results, ...) for custom l_vec targets, but the wrapper still does not expose l_vec, so the changed target-parameter behavior is not fully propagated through the advertised public API.
  • The dCDH-specific deviations from canonical HonestDiD are now documented and warned at runtime, so they are P3 informational under the stated review policy.
  • The new tests cover L_max, placebo=False, and boundary-adjacent consecutiveness failures, but they still do not exercise an end-to-end trends_nonparam + HonestDiD support-trimming case.
  • Static review only: pytest is not installed in this environment.

Path to Approval

  1. Make the standalone custom-target path truthful. Preferred: add l_vec to compute_honest_did() and forward it into HonestDiD(...). Minimum acceptable: remove the new docs/registry guidance that tells users to pass custom l_vec through compute_honest_did(...) and point them to HonestDiD(l_vec=...).fit(results) instead.
  2. Add a regression for the chosen public path using l_vec = [1, 0, ..., 0] and assert the returned original_estimate equals the on-impact effect.

Methodology

  • P1 [Newly identified] Impact: the affected method is HonestDiD / Rambachan-Roth sensitivity analysis as applied to the dCDH placebo + event-study surface. HonestDiD itself exposes l_vec as the scalar target definition in diff_diff/honest_did.py:L2122-L2152, but compute_honest_did() only accepts method, M, and alpha in diff_diff/honest_did.py:L2667-L2698. The new dCDH fit docstring and registry note nevertheless tell users to use compute_honest_did(results, ...) for custom l_vec targets in diff_diff/chaisemartin_dhaultfoeuille.py:L540-L549 and docs/methodology/REGISTRY.md:L620-L620. HonestDiD’s reference interface defines the scalar estimand as theta = l_vec' tau, with l_vec defaulting to the first basis vector (1, 0, ..., 0), so l_vec propagation is methodologically load-bearing rather than cosmetic. Concrete fix: add l_vec to compute_honest_did() and forward it to HonestDiD, or change the new docs to point to HonestDiD(l_vec=...).fit(results) instead. (search.r-project.org)
  • P3-informational Impact: the dCDH-specific deviations themselves are properly documented and warned: placebo DID^{pl}_l coefficients are used as pre-period inputs, covariance is diagonal because no full dCDH cross-horizon VCV exists, and non-consecutive horizons from trends_nonparam support trimming are filtered to the largest consecutive block with a warning. Those are documented in docs/methodology/REGISTRY.md:L620-L620 and implemented in diff_diff/honest_did.py:L829-L839 and diff_diff/honest_did.py:L880-L934. Concrete fix: none.

Code Quality

  • No material findings in the changed code.

Performance

  • No material findings in the changed code.

Maintainability

  • No material findings beyond the API propagation issue above.

Tech Debt

Security

  • No findings.

Documentation/Tests

  • P2 Impact: the new tests prove the default equal-weight target and the boundary-adjacent failure cases, but they still do not cover the actual trends_nonparam support-trimming + HonestDiD integration path described in the new registry note. There is a trends_nonparam unequal-support test without HonestDiD in tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744, and there are HonestDiD consecutiveness tests that manually corrupt SEs in tests/test_honest_did.py:L1419-L1457, but no end-to-end fit(..., trends_nonparam=..., honest_did=True) regression. The new summary test also only checks section presence in tests/test_chaisemartin_dhaultfoeuille.py:L3480-L3490. Concrete fix: add an end-to-end support-trimming fixture that produces outer-horizon gaps, assert the warning plus retained block, and assert the HonestDiD summary target text stays consistent with the target actually used.

- Add l_vec parameter to compute_honest_did() so the advertised
  custom-target path actually works (was missing from wrapper)
- Add test_honest_did_custom_l_vec_on_impact: l_vec=[1,0] targets
  on-impact effect, asserts original_estimate matches DID_1
- Add test_honest_did_with_trends_nonparam: end-to-end
  trends_nonparam + honest_did=True integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a351e2e2f68a1bb60c8a3584107dc56fb56491dd


Overall Assessment

⚠️ Needs changes — the prior custom-target wrapper issue is fixed, but one P1 [Newly identified] parameter-propagation problem remains in the new dCDH HonestDiD reporting surface.

Executive Summary

Path to Approval

  1. Persist target metadata on HonestDiDResults (l_vec or a rendered target label) in diff_diff/honest_did.py:L2256-L2324, and render that metadata in diff_diff/chaisemartin_dhaultfoeuille_results.py:L974-L1000 instead of hard-coding the equal-weight target.
  2. Update the post-hoc docs in diff_diff/chaisemartin_dhaultfoeuille_results.py:L339-L343 to say compute_honest_did(results, ...) returns a standalone HonestDiDResults and only appears in results.summary() if assigned to results.honest_did_results.
  3. Add a regression that computes compute_honest_did(r, l_vec=np.array([1, 0, ...])), attaches it to r.honest_did_results, and asserts r.summary() reports the custom target consistently with bounds.original_estimate.

Methodology

  • Severity: P1 [Newly identified]. Impact: the changed public custom-target path is still not fully propagated through the results surface. compute_honest_did(..., l_vec=...) now correctly defines a custom scalar target in diff_diff/honest_did.py:L2667-L2705, and HonestDiD.fit() uses that l_vec in diff_diff/honest_did.py:L2256-L2265, but the returned HonestDiDResults drops the target metadata in diff_diff/honest_did.py:L2310-L2324, while the new dCDH summary helper always prints Equal-weight avg over post horizons in diff_diff/chaisemartin_dhaultfoeuille_results.py:L989-L991. Because canonical HonestDiD treats l_vec as the estimand definition and defaults it to the first post-treatment basis vector, this is a methodology-significant reporting mismatch, not a cosmetic summary issue. (search.r-project.org) Concrete fix: store and render target metadata (l_vec or a target label) on HonestDiDResults, and cover the custom-target summary path with a regression.
  • Severity: P3-informational. Impact: the dCDH extension uses DID^{pl}_l placebos as pre-period inputs, diagonal variance, and an equal-weight auto-target in fit(). Those differ from canonical HonestDiD’s event-study-coefficient/covariance interface, but they are explicitly documented in docs/methodology/REGISTRY.md:L620-L620 and warned in diff_diff/honest_did.py:L818-L934, so they are non-blocking under the review policy. (search.r-project.org) Concrete fix: none.

Code Quality

  • No material findings.

Performance

  • No material findings.

Maintainability

  • No separate findings beyond the target-metadata propagation issue above.

Tech Debt

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new tests cover standalone custom l_vec, placebo-input warnings, and manually induced missing-boundary errors, but they still do not exercise the actual trends_nonparam support-trimming path that produces non-consecutive horizons and triggers the new warning/filter logic. The unequal-support test in tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744 is non-HonestDiD, the HonestDiD trends_nonparam test in tests/test_chaisemartin_dhaultfoeuille.py:L3407-L3429 keeps balanced support, and the boundary tests in tests/test_honest_did.py:L1419-L1457 simulate gaps by mutating SEs rather than via the real trimming path. Concrete fix: add one end-to-end fixture where within-set support trimming drops outer horizons, then assert both the warning text and the retained consecutive pre/post block.

…namically

- Add target_label field to HonestDiDResults (default: equal-weight avg)
- HonestDiD.fit() detects common l_vec patterns and sets human-readable label
  (on-impact, equal-weight, or custom with vector)
- Summary renders hd.target_label instead of hard-coded string
- Add test_honest_did_custom_l_vec_summary_label: attaches custom-target
  results and asserts summary shows "on-impact" not "Equal-weight"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

1 similar comment
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 419d872c2da379516949f91ef22837dd072daccb


Overall Assessment

⚠️ Needs changes — the prior custom-l_vec reporting issue looks resolved, but one P1 [Newly identified] remains in the new trends_nonparam + HonestDiD edge case: horizon trimming can change the HonestDiD estimand without persisting which horizons were actually used.

Executive Summary

  • The previous re-review P1 is addressed: custom l_vec now flows into stored HonestDiD metadata and the dCDH summary block renders hd.target_label instead of hard-coding the equal-weight target. See diff_diff/honest_did.py:194, diff_diff/honest_did.py:2256, diff_diff/honest_did.py:2332, diff_diff/chaisemartin_dhaultfoeuille_results.py:990, and tests/test_chaisemartin_dhaultfoeuille.py:3407.
  • P1 [Newly identified]: when dCDH HonestDiD drops non-consecutive horizons to form a consecutive block, the bounds are computed on the retained post-period subset, but the persisted results surface does not record that retained subset. In trimmed-support cases, the public summary can still present a generic “Equal-weight avg over post horizons,” which no longer uniquely identifies the estimand.
  • The dCDH-specific departures from canonical HonestDiD are documented in the Methodology Registry, so those deviations are non-blocking P3s.
  • The earlier test-gap concern is still open as a P2: the suite does not drive the real trends_nonparam support-trimming path through honest_did=True and assert the retained block metadata/warning end to end.
  • Static review only: this environment does not have pytest or core runtime deps like pandas, so I could not execute the new tests.

Path to Approval

  1. Persist the retained horizon set used by HonestDiD, at minimum post_periods_used and ideally both pre/post vectors, on HonestDiDResults.
  2. Render that retained block in both the standalone HonestDiD summary and the dCDH summary block; for l_vec=None, label the target against the retained block explicitly instead of the current generic string.
  3. Add an end-to-end regression where trends_nonparam support-trimming removes an outer horizon under honest_did=True, then assert both the warning text and the persisted target metadata/summary reflect the retained block.

Methodology

  • Severity P1 [Newly identified]. Impact: _extract_event_study_params() now drops non-consecutive dCDH horizons and keeps only the consecutive block around -1/+1; HonestDiD.fit() then rebuilds l_vec against that shorter post-period vector. But the returned object stores only a generic target_label, the dCDH summary renders only that generic label, and the standalone HonestDiDResults.summary()/to_dict()/to_dataframe() surfaces still do not expose which post horizons were actually used. In a trends_nonparam support-trimming case, users can therefore read correct bounds under an under-specified estimand label. See diff_diff/honest_did.py:821, diff_diff/honest_did.py:925, diff_diff/honest_did.py:2256, diff_diff/honest_did.py:2321, diff_diff/honest_did.py:252, diff_diff/honest_did.py:335, and diff_diff/chaisemartin_dhaultfoeuille_results.py:990. Official HonestDiD defines the scalar target as theta = l_vec'tau over the post-treatment coefficients, with l_vec itself defining the parameter of interest, so changing the retained post-period set changes the estimand, not just the display label. Concrete fix: store and render the retained horizon vector(s) used by HonestDiD, and include them in summary/export surfaces. (search.r-project.org)
  • Severity P3-informational. Impact: the dCDH branch uses placebo estimates as pre-period inputs, diagonal covariance, an equal-weight default target in fit(), and consecutive-block filtering when trends_nonparam creates gaps. Those differ from canonical HonestDiD’s event-study coefficient/covariance interface, but they are explicitly documented in docs/methodology/REGISTRY.md:620, so they are non-blocking under the stated policy. Concrete fix: none. The paper and official package both frame HonestDiD as robust inference for DiD/event-study coefficients under restrictions on post-treatment violations relative to pre-trends; the library is allowed to extend that interface as long as the extension is documented. (academic.oup.com)

Code Quality

No material findings.

Performance

No material findings.

Maintainability

No separate findings beyond the retained-horizon metadata propagation issue above.

Tech Debt

  • Severity P3-informational. Impact: the PR’s default dCDH HonestDiD path inherits the pre-existing Delta^RM confidence-set approximation in honest_did.py. The official HonestDiD RM interface constructs confidence sets through test inversion and defaults to method="C-LF", while this library’s RM path still uses the documented naive FLCI approximation. That limitation is already documented in docs/methodology/REGISTRY.md:2171 and tracked in TODO.md:73, so it is non-blocking for this PR. Concrete fix: none for this PR. (search.r-project.org)

Security

No findings.

Documentation/Tests

  • Severity P2. Impact: the suite still does not exercise the real end-to-end support-trimming path that the new dCDH HonestDiD logic is designed to handle. The unequal-support trends_nonparam test stops before HonestDiD in tests/test_chaisemartin_dhaultfoeuille.py:2706, the honest_did=True test with trends_nonparam uses a balanced design in tests/test_chaisemartin_dhaultfoeuille.py:3426, and the boundary tests in tests/test_honest_did.py:1419 simulate missing horizons by mutating SEs rather than via the real support-trimming path. Concrete fix: add one fixture where within-set support trimming drops an outer placebo or event-study horizon under honest_did=True, then assert both the warning text and the persisted retained-horizon metadata.

- Add pre_periods_used and post_periods_used fields to HonestDiDResults
  so the retained horizon set is always available on the results object
- HonestDiD.fit() populates both fields from the extracted period lists
- Summary renders retained horizons below the target label
- Add test_honest_did_retains_period_metadata asserting fields populated
  and summary shows "Post horizons used:"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 33e9c678dc2754341843a75a9df7865e8c534922


Overall Assessment

⚠️ Needs changes — the prior retained-horizon issue is only partially resolved. The PR now stores the retained pre/post horizons and renders them in the dCDH top-level summary, but the standalone HonestDiDResults summary/export surfaces still drop that estimand metadata, so direct compute_honest_did(...) output remains under-specified when trends_nonparam trimming changes the retained block or a custom l_vec changes the target.

Executive Summary

  • The previous P1 is partially addressed: HonestDiDResults now carries target_label, pre_periods_used, and post_periods_used, and the dCDH summary() block prints them.
  • One P1 remains: standalone HonestDiDResults.summary(), to_dict(), and to_dataframe() still omit that metadata, so printed/serialized HonestDiD output does not uniquely identify the target after trimming or custom l_vec.
  • The dCDH-specific departures from canonical HonestDiD are documented in the Methodology Registry, so those are P3-informational rather than defects.
  • The new tests still do not exercise the real unequal-support trends_nonparam path through HonestDiD end to end.
  • Static review only: this environment lacks pytest, numpy, and pandas, so I could not run the changed tests.

Path to Approval

  1. Add target_label, pre_periods_used, and post_periods_used to the public standalone HonestDiDResults surfaces: summary(), to_dict(), and therefore to_dataframe().
  2. Add an end-to-end regression with real unequal-support trends_nonparam trimming under honest_did=True or compute_honest_did(...), and assert both the warning text and the retained-horizon metadata on the standalone HonestDiD output.

Methodology

  • Severity P1. Impact: Rambachan-Roth / official HonestDiD define the scalar target as theta = l_vec' tau, with l_vec determining the parameter of interest and the package defaulting to the first post-period basis vector. This PR now records target_label, pre_periods_used, and post_periods_used when constructing HonestDiDResults, and the dCDH top-level summary renders them, but standalone HonestDiDResults.summary() and to_dict()/to_dataframe() still omit them. As a result, standalone compute_honest_did(...) output still loses the estimand definition exactly in the cases this PR adds/supports: support trimming and custom l_vec. See diff_diff/honest_did.py:L2323-L2339, diff_diff/chaisemartin_dhaultfoeuille_results.py:L980-L1009, and diff_diff/honest_did.py:L254-L357. Concrete fix: surface target_label, pre_periods_used, and post_periods_used in standalone HonestDiD summary/export methods and add a regression that verifies those fields after trimmed-support extraction and custom l_vec. (search.r-project.org)
  • Severity P3. Impact: The dCDH-specific departures from canonical HonestDiD are all explicitly documented in the Methodology Registry: placebo estimates as pre-period inputs, diagonal covariance, equal-weight default when fit(honest_did=True) is used, and consecutive-block filtering under trends_nonparam. Per the review policy, I am treating those as non-blocking documented deviations. See docs/methodology/REGISTRY.md:L620-L620. Concrete fix: none.

Code Quality

  • No material findings in the changed code.

Performance

  • No material findings in the changed code.

Maintainability

  • No separate findings beyond the metadata-propagation issue above.

Tech Debt

  • Severity P3. Impact: the PR newly exposes dCDH users to the pre-existing Delta^RM naive-FLCI limitation, but that limitation is already documented in the Methodology Registry and tracked in TODO.md, so it is non-blocking under the stated policy. See docs/methodology/REGISTRY.md:L2171-L2173 and TODO.md:L73-L73. Concrete fix: none for this PR.

Security

  • No findings.

Documentation/Tests

…mary/export

- HonestDiDResults.summary() now renders target_label, pre/post_periods_used
- HonestDiDResults.to_dict() includes target_label, pre/post_periods_used
- to_dataframe() inherits from to_dict() automatically
- Add test_dcdh_standalone_surfaces_target_metadata verifying all three
  surfaces include target metadata for custom l_vec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: dd06b3bbfd2d68bfc356caba1cc704a662cbcec0


Overall Assessment

Looks good. This re-review resolves the prior P1 on missing standalone HonestDiD estimand metadata; the highest remaining unmitigated item is P2 test coverage.

Executive Summary

Methodology

  • Severity P3. Impact: No unmitigated methodology defect remains in the changed estimator path. Rambachan-Roth frame HonestDiD as a sensitivity-analysis / partial-identification framework, and the official R helpers define the scalar target as theta = l_vec' tau with l_vec defaulting to the first post-treatment basis vector; this PR’s dCDH branch intentionally departs by using DID^{pl}_l placebos as pre-period inputs and an equal-weight default target, but those departures are now explicitly documented in docs/methodology/REGISTRY.md:L620-L620 and surfaced in runtime extraction/warnings at diff_diff/honest_did.py:L843-L946. Concrete fix: none. (academic.oup.com)

Code Quality

  • No findings in the changed code.

Performance

  • No findings in the changed code.

Maintainability

  • No findings in the changed code. The summary() refactor into helper blocks is a net improvement.

Tech Debt

  • Severity P3. Impact: The pre-existing Delta^RM naive-FLCI limitation remains tracked in TODO.md:L73-L73, and the methodology registry still marks it as a documented deviation rather than a new regression in this PR (docs/methodology/REGISTRY.md:L2171-L2173). Concrete fix: none for this PR.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: The new tests cover unequal-support trends_nonparam without HonestDiD, balanced trends_nonparam with HonestDiD, and synthetic boundary failures created by mutating SEs, but they still do not run the real support-trimmed trends_nonparam path all the way through HonestDiD and assert the warning plus pre_periods_used/post_periods_used on the resulting standalone or embedded HonestDiD output (tests/test_chaisemartin_dhaultfoeuille.py:L2706-L2744, tests/test_chaisemartin_dhaultfoeuille.py:L3444-L3466, tests/test_honest_did.py:L1419-L1460). Concrete fix: add one end-to-end fixture where trends_nonparam removes at least one non-boundary horizon but retains -1 and +1, then run honest_did=True and/or compute_honest_did(results) and assert both the drop-warning text and the retained-horizon metadata on HonestDiDResults.

test_honest_did_trends_nonparam_trimming: State B's early-switching
controls vanish at far horizons, causing N_l=0 at h=3 and h=-3.
HonestDiD extraction drops NaN-SE horizons and retains [-2,-1,1,2].
Asserts n_obs=0 at trimmed horizons, finite bounds on retained block,
and post_periods_used excludes h=3.

This exercises the real trends_nonparam support-trimming path through
HonestDiD (not SE mutation), addressing the recurring P2 finding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4943910e66b1d878d169e74b63d8ebc8da9fd753


Overall Assessment

⛔ Blocker — one unmitigated P0 remains. The new fit-time HonestDiD integration silently ignores a non-default estimator alpha, so fit(..., honest_did=True) can report the wrong robust CI/significance level whenever ChaisemartinDHaultfoeuille(alpha != 0.05) is used.

Executive Summary

Methodology

  • Severity P0. Impact: fit(..., honest_did=True) omits alpha=self.alpha when it constructs results.honest_did_results (diff_diff/chaisemartin_dhaultfoeuille.py:2425). compute_honest_did() defaults to alpha=0.05 (diff_diff/honest_did.py:2698), and the dCDH summary prints hd.alpha directly (diff_diff/chaisemartin_dhaultfoeuille_results.py:983). That means a user fitting ChaisemartinDHaultfoeuille(alpha=0.10, honest_did=True) will silently get 95% HonestDiD intervals/significance instead of 90%. Concrete fix: pass alpha=self.alpha into the fit-time compute_honest_did(...) call and add a regression test with alpha=0.10 asserting results.honest_did_results.alpha == results.alpha and narrower robust CIs than at alpha=0.05.
  • Severity P3. Impact: The dCDH-specific deviations from canonical HonestDiD are now documented and match the implementation: placebo-based pre-period coefficients (diff_diff/honest_did.py:843), diagonal covariance with no full VCV (diff_diff/honest_did.py:867), equal-weight post-treatment default target (diff_diff/honest_did.py:2269), and consecutive-block trimming with warning (diff_diff/honest_did.py:894), all reflected in docs/methodology/REGISTRY.md:620. Concrete fix: none. Rambachan-Roth frame HonestDiD as robust sensitivity analysis for a partially identified scalar target under bounded parallel-trends violations, and the official HonestDiD package uses l_vec to switch away from its default first post-treatment effect, so these documented dCDH extensions are not methodology defects. (academic.oup.com)

Code Quality

No additional findings beyond the alpha propagation defect above.

Performance

No findings.

Maintainability

No findings. The summary() split into helper blocks is a net improvement.

Tech Debt

  • Severity P3. Impact: the pre-existing HonestDiD Delta^RM naive-FLCI limitation remains tracked in TODO.md:73, so it is not a blocker for this PR. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

  • Severity P1. Impact: the new honest_did parameter is not exercised on the bootstrap inference path. fit() rewrites per-horizon SEs in event_study_effects and placebo_event_study from bootstrap outputs (diff_diff/chaisemartin_dhaultfoeuille.py:1905, diff_diff/chaisemartin_dhaultfoeuille.py:2108), and _extract_event_study_params() consumes exactly those se fields for HonestDiD (diff_diff/honest_did.py:867). But the new test matrices only cover the analytical path (tests/test_chaisemartin_dhaultfoeuille.py:3253, tests/test_honest_did.py:1343). Under the stated parameter-interaction rule, that is still a P1. Concrete fix: add one fit(..., honest_did=True, n_bootstrap>0) test and one standalone compute_honest_did() test on a bootstrap-fitted dCDH result, asserting finite bounds and stable retained-horizon metadata.
  • Severity P2. Impact: the old support-trimming coverage gap is only partially resolved. The new end-to-end fixture reaches the real trends_nonparam trimming path and checks retained horizons, but it suppresses warnings with simplefilter("ignore") instead of asserting the documented warning text (tests/test_chaisemartin_dhaultfoeuille.py:3503). A regression that drops the promised “Dropping non-consecutive horizons …” warning from diff_diff/honest_did.py:938 would still pass. Concrete fix: capture warnings in that test and assert the warning text together with pre_periods_used / post_periods_used.

Path to Approval

  1. Propagate alpha=self.alpha in the fit-time HonestDiD call at diff_diff/chaisemartin_dhaultfoeuille.py:2425, and add a non-default-alpha regression test.
  2. Add bootstrap interaction tests for both fit(..., honest_did=True) and standalone compute_honest_did() on bootstrap-fitted dCDH results.
  3. Update test_honest_did_trends_nonparam_trimming to record and assert the documented trimming warning instead of suppressing it.

compute_honest_did() call in fit() now passes alpha=self.alpha
instead of inheriting the default 0.05. Added regression test
asserting honest_did_results.alpha matches estimator alpha.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 46e6dc50428135cca7dda8f9199b726106bc4b5b


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains. The prior implementation defects are addressed, but the new honest_did surface still is not tested on the bootstrap inference path.

Executive Summary

Methodology

Affected method: HonestDiD (Rambachan-Roth 2023) applied to the dCDH placebo + event-study surface. The new summary() blocks are presentation-only.

  • Severity P3. Impact: The new dCDH path deviates from canonical event-study HonestDiD by using dCDH placebo coefficients as pre-period inputs, using diagonal covariance, and defaulting fit-time calls to an equal-weight post-treatment target. Those departures are explicitly documented in docs/methodology/REGISTRY.md:L620 and mirrored in the runtime warning at diff_diff/honest_did.py:L843-L853, so under the review rubric they are informational, not defects. Rambachan-Roth frame HonestDiD as inference for a partially identified causal parameter under restrictions on post-treatment violations, and the official HonestDiD package exposes the scalar target as theta = l_vec' tau with a first-basis-vector default; the library’s equal-weight target is therefore a documented target-choice deviation rather than a methodology error. Concrete fix: none. citeturn1view0turn2view0

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity P3. Impact: The pre-existing HonestDiD Delta^RM naive-FLCI limitation remains tracked in TODO.md:L73, so it is not a blocker for this PR. Concrete fix: none for this PR.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Add a fit-time bootstrap regression for ChaisemartinDHaultfoeuille(..., n_bootstrap>0).fit(..., L_max>=2, honest_did=True) that asserts results.honest_did_results is populated, its CI endpoints are finite, and its retained horizon metadata matches the post-bootstrap finite-SE surface.
  2. Add a standalone bootstrap regression for compute_honest_did() on a bootstrap-fitted dCDH result that exercises the same SE-consumption path and verifies stable bounds/metadata.

P1: Add test_honest_did_with_bootstrap - fits with n_bootstrap=49 and
    honest_did=True, asserts finite bounds and retained horizons.
P2: test_honest_did_trends_nonparam_trimming now captures warnings and
    asserts the placebo-based pre-period warning is emitted (was
    suppressing all warnings before).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 820c9dd0a1302590153bf091434956511ffec6b5


Overall Assessment

✅ Looks good — the prior bootstrap/HonestDiD interaction P1 is addressed, and I did not find any new unmitigated P0/P1 issues in the changed estimator, inference, or default-behavior paths. One non-blocking P2 test gap remains.

Executive Summary

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity P3 | Impact: honest_did=True defaults to method="relative_magnitude", whose CI path still uses the pre-existing naive-FLCI implementation documented in the HonestDiD registry and already tracked in TODO.md. That limitation predates this PR and is properly tracked, so it is non-blocking here. Concrete fix: none for this PR. docs/methodology/REGISTRY.md:L2171-L2173 TODO.md:L73

Security

No findings.

Documentation/Tests

  • Severity P2 | Impact: The remaining re-review gap is test-specific. tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3537 records warnings but only asserts the placebo-input warning, even though production code also promises a distinct "Dropping non-consecutive horizons" warning when it trims to the boundary-adjacent consecutive block at diff_diff/honest_did.py:L934-L946. A regression that removes or weakens that warning would still pass. Concrete fix: in test_honest_did_trends_nonparam_trimming, additionally assert a warning containing "Dropping non-consecutive horizons" and ideally the retained pre= / post= metadata.

Static only: I could not run the changed tests in this environment because the sandbox lacks pytest and numpy.

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

Add test_dcdh_interior_gap_triggers_trimming_warning: corrupts h=-2 SE
to create an interior gap [-3, -1], asserts the "Dropping non-consecutive
horizons" warning is emitted and pre_periods_used == [-1].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b88a97e7d39bb75a3cc8ed00ee24ea96aa48e970


Overall Assessment

✅ Looks good

No unmitigated P0 or P1 findings. The prior re-review gap around non-consecutive-horizon warning coverage is addressed, and I did not find new correctness, methodology, variance/SE, or assumption-check regressions in the changed dCDH HonestDiD path.

Executive Summary

  • The changed methodology-sensitive paths are the new fit(..., honest_did=True) integration, the dCDH adapter inside HonestDiD extraction, and the new summary rendering blocks. I did not find new unmitigated P0/P1 defects in those paths.
  • The dCDH-specific HonestDiD departures are explicitly documented in the Methodology Registry and mirrored in code: placebo coefficients as pre-period inputs, diagonal covariance, equal-weight post-horizon targeting by default, and consecutive-block trimming for trends_nonparam gaps.
  • The prior re-review test gap is resolved by the new explicit warning assertion in tests/test_honest_did.py:L1449-L1470.
  • The end-to-end fit path is also covered for validation, standalone parity, trimming, bootstrap, and summary rendering in tests/test_chaisemartin_dhaultfoeuille.py:L3253-L3311, tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3550, and tests/test_chaisemartin_dhaultfoeuille.py:L3558-L3654.
  • Static review only: pytest is not installed in this environment, so I could not execute the targeted tests.

Methodology

  • Severity P3 | Impact: HonestDiD is defined around an event-study coefficient vector and covariance matrix for a scalar target l_vec' tau; the official package defaults l_vec to the first post-treatment effect and exposes conditional/hybridized CI routines for non-smoothness restrictions. This PR’s dCDH integration instead uses DID^{pl}_l placebos as pre-period coefficients, diagonal covariance, and an equal-weight post-horizon target when called from fit(). Those are real deviations from canonical/package behavior, but they are explicitly documented in docs/methodology/REGISTRY.md:L620, warned in diff_diff/honest_did.py:L843-L946, and described in diff_diff/chaisemartin_dhaultfoeuille.py:L540-L549. Concrete fix: none; documented deviation only. citeturn1view0

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity P3 | Impact: The DeltaRM CI path still uses naive FLCI rather than the paper/package’s ARP conditional or hybrid confidence sets. That limitation predates this PR, is conservative rather than silently anti-conservative, and is already documented/tracked in docs/methodology/REGISTRY.md:L2171-L2173, diff_diff/honest_did.py:L2397-L2437, and TODO.md:L73. Concrete fix: none for this PR; already tracked. citeturn1view0

Security

No findings.

Documentation/Tests

No findings. The previous warning-coverage gap is resolved by tests/test_honest_did.py:L1449-L1470, and the fit-time integration path now has direct coverage for validation, standalone parity, trimming, bootstrap, and summary rendering in tests/test_chaisemartin_dhaultfoeuille.py:L3253-L3311, tests/test_chaisemartin_dhaultfoeuille.py:L3515-L3550, and tests/test_chaisemartin_dhaultfoeuille.py:L3558-L3654.

@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 14, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b88a97e7d39bb75a3cc8ed00ee24ea96aa48e970


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings in the changed dCDH HonestDiD or summary() paths. The methodology-sensitive departures are documented in the Methodology Registry, the new fit-time guards cover the key preconditions, and the re-review warning/trimming paths now have direct test coverage. Static review only: pytest is not installed in this environment.

Executive Summary

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity P3 | Impact: Delta^RM still uses naive FLCI instead of the paper/package’s ARP conditional or hybrid CI. That limitation predates this PR, is conservative rather than silently anti-conservative, and is already documented/tracked in docs/methodology/REGISTRY.md:2171, diff_diff/honest_did.py:2417, and TODO.md:73. Concrete fix: none for this PR; already tracked.

Security

No findings.

Documentation/Tests

No findings. The new tests cover fit-time HonestDiD integration, precondition errors, placebo-warning emission, consecutive-grid trimming, and the new summary blocks at tests/test_chaisemartin_dhaultfoeuille.py:3253, tests/test_chaisemartin_dhaultfoeuille.py:3515, tests/test_chaisemartin_dhaultfoeuille.py:3558, tests/test_honest_did.py:1402, and tests/test_honest_did.py:1449. I could not run them here because pytest is not installed.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 14, 2026
@igerber igerber merged commit e92b6f7 into main Apr 14, 2026
23 of 24 checks passed
@igerber igerber deleted the feature/dcdh-honest-did-integration branch April 14, 2026 21:37
@igerber igerber mentioned this pull request Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant