Skip to content

feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth#964

Open
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/paper-pipeline-complete
Open

feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth#964
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/paper-pipeline-complete

Conversation

@vtewari2
Copy link
Copy Markdown

@vtewari2 vtewari2 commented Apr 12, 2026

Description:

Summary

This PR is a consolidated reference branch containing the complete
implementation of the computational medical mistrust pipeline from:

Boag et al. "Racial Disparities and Mistrust in End-of-Life Care."
MLHC 2018. arXiv:1808.03827
Original code: github.com/wboag/eol-mistrust

It consolidates all changes from three focused PRs into a single reviewable
branch for reference and integration testing:

Component PR Branch
L1 regularization pr/uiuccs598dlh/logistic-regression/l1-regularization
Mistrust tasks pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3
Pipeline example pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018

Preferred merge path: review and merge the three focused PRs above
in order (#960#961#962 ). Use this branch for end-to-end validation or
as a single-diff reference.


Background

Boag et al. 2018 demonstrates that racial disparities in aggressive ICU
end-of-life care — Black patients receiving ~879 min more mechanical
ventilation than White patients (p=0.009) — are better explained by
medical mistrust than by race alone. Mistrust stratification amplifies
the ventilation gap to ~2,319 min (3×). The paper trains L1-regularised
logistic regression on structured interpersonal interaction features
extracted from CHARTEVENTS to produce continuous mistrust proxy scores.

This PR brings that methodology natively into the PyHealth framework.


Changes

1. pyhealth/models/logistic_regression.py

Add optional l1_lambda: float = 0.0 parameter (fully backward-compatible).
When non-zero, appends a sparsity-inducing L1 penalty to the training loss:

loss = BCE(logits, y_true) + l1_lambda * ‖fc.weight‖₁

Equivalent to sklearn LogisticRegression(penalty='l1', C=C) with
l1_lambda = 1 / (C × n_train).

2. pyhealth/tasks/mistrust_mimic3.py (new)

build_interpersonal_itemids(d_items_path)
Reads D_ITEMS.csv.gz and returns {itemid: label} for ~168 CHARTEVENTS
items matched via interpersonal keyword list from the paper's trust.ipynb.

MistrustNoncomplianceMIMIC3

  • input_schema = {"interpersonal_features": "sequence"}
  • output_schema = {"noncompliance": "binary"}
  • Label 1 if any NOTEEVENTS note contains "noncompliant", else 0
  • Base rate ≈ 0.88% in MIMIC-III v1.4

MistrustAutopsyMIMIC3

  • input_schema = {"interpersonal_features": "sequence"}
  • output_schema = {"autopsy_consent": "binary"}
  • Label 1 for autopsy consent (mistrust), 0 for decline (trust)
  • Admissions with both signals excluded as ambiguous
  • Black patients consent at ~39% vs ~26% for White (MIMIC-III v1.4)

Both tasks apply full feature normalisation from trust.ipynb cell 7
(restraint coarsening, bath categories, skip rules for pain subtypes).
Feature keys take the form "category||normalised_value" and are
tokenised automatically by PyHealth during set_task().

3. pyhealth/tasks/__init__.py

Exports MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3,
build_interpersonal_itemids.

4. examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py (new)

End-to-end pipeline reproducing the paper's classifier experiments:

MIMIC3Dataset(CHARTEVENTS + NOTEEVENTS)
→ build_interpersonal_itemids()
→ MistrustNoncomplianceMIMIC3 / MistrustAutopsyMIMIC3
→ LogisticRegression(l1_lambda=...)
→ Trainer → AUC-ROC

Includes --synthetic flag for smoke-test without PhysioNet access.


Expected results (MIMIC-III v1.4)

Task n patients Positive rate AUC-ROC
Noncompliance 54,510 0.88% 0.667
Autopsy consent 1,009 26.8% 0.531

Usage

from pyhealth.datasets import MIMIC3Dataset
from pyhealth.tasks import (
    MistrustNoncomplianceMIMIC3,
    MistrustAutopsyMIMIC3,
    build_interpersonal_itemids,
)
from pyhealth.models import LogisticRegression
from pyhealth.trainer import Trainer

itemid_to_label = build_interpersonal_itemids("/path/to/D_ITEMS.csv.gz")

base_dataset = MIMIC3Dataset(
    root="/path/to/mimic-iii/1.4",
    tables=["CHARTEVENTS", "NOTEEVENTS"],
)
nc_dataset = base_dataset.set_task(
    MistrustNoncomplianceMIMIC3(itemid_to_label=itemid_to_label)
)

model = LogisticRegression(
    dataset=nc_dataset,
    l1_lambda=2.62e-4,   # equiv. sklearn C=0.1, n_train=38,157
)

trainer = Trainer(model=model)
trainer.train(train_dataloader=..., val_dataloader=..., epochs=50)

---
References

┌───────────────────┬─────────────────────────────────────────────┐
│     ResourceLink                     │
├───────────────────┼─────────────────────────────────────────────┤
│ Paper (MLHC 2018) │ https://arxiv.org/abs/1808.03827            │
├───────────────────┼─────────────────────────────────────────────┤
│ Original codehttps://github.com/wboag/eol-mistrust       │
├───────────────────┼─────────────────────────────────────────────┤

Complete implementation of the interpersonal-feature mistrust classifiers
from "Racial Disparities and Mistrust in End-of-Life Care" (MLHC 2018,
arXiv:1808.03827) using PyHealth. Consolidates all changes from:

  pr/uiuccs598dlh/logistic-regression/l1-regularization
  pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3
  pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018

pyhealth/models/logistic_regression.py
  - Add l1_lambda (float, default 0.0) to LogisticRegression
  - In forward(): loss += l1_lambda * ||fc.weight||_1 when l1_lambda > 0
  - Equivalent to sklearn LogisticRegression(penalty='l1', C=C) with
    l1_lambda = 1 / (C * n_train)

pyhealth/tasks/mistrust_mimic3.py  [new]
  - build_interpersonal_itemids(): reads D_ITEMS.csv.gz, returns
    {itemid: label} for ~168 interpersonal CHARTEVENTS items
  - MistrustNoncomplianceMIMIC3: sequence task predicting noncompliance
    label from NOTEEVENTS; base rate 0.88% in MIMIC-III v1.4
  - MistrustAutopsyMIMIC3: sequence task predicting autopsy consent;
    ambiguous admissions excluded; Black consent rate 39% vs White 26%
  - Full feature normalisation from trust.ipynb cell 7

pyhealth/tasks/__init__.py
  - Export MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3,
    build_interpersonal_itemids

examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py  [new]
  - End-to-end pipeline: MIMIC3Dataset -> set_task -> LogisticRegression
    with L1 -> Trainer -> AUC-ROC evaluation
  - --synthetic flag for smoke-test without PhysioNet access
  - Expected AUC: noncompliance 0.667, autopsy 0.531

Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant