feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth by vtewari2 · Pull Request #964 · sunlabuiuc/PyHealth

vtewari2 · 2026-04-12T00:15:18Z

Description:

Summary

This PR is a consolidated reference branch containing the complete
implementation of the computational medical mistrust pipeline from:

Boag et al. "Racial Disparities and Mistrust in End-of-Life Care."
MLHC 2018. arXiv:1808.03827
Original code: github.com/wboag/eol-mistrust

It consolidates all changes from three focused PRs into a single reviewable
branch for reference and integration testing:

Component PR	Branch
L1 regularization	`pr/uiuccs598dlh/logistic-regression/l1-regularization`
Mistrust tasks	`pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3`
Pipeline example	`pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018`

Preferred merge path: review and merge the three focused PRs above
in order (#960 → #961 → #962 ). Use this branch for end-to-end validation or
as a single-diff reference.

Background

Boag et al. 2018 demonstrates that racial disparities in aggressive ICU
end-of-life care — Black patients receiving ~879 min more mechanical
ventilation than White patients (p=0.009) — are better explained by
medical mistrust than by race alone. Mistrust stratification amplifies
the ventilation gap to ~2,319 min (3×). The paper trains L1-regularised
logistic regression on structured interpersonal interaction features
extracted from CHARTEVENTS to produce continuous mistrust proxy scores.

This PR brings that methodology natively into the PyHealth framework.

Changes

1. `pyhealth/models/logistic_regression.py`

Add optional l1_lambda: float = 0.0 parameter (fully backward-compatible).
When non-zero, appends a sparsity-inducing L1 penalty to the training loss:

loss = BCE(logits, y_true) + l1_lambda * ‖fc.weight‖₁

Equivalent to sklearn LogisticRegression(penalty='l1', C=C) with
l1_lambda = 1 / (C × n_train).

2. `pyhealth/tasks/mistrust_mimic3.py` (new)

build_interpersonal_itemids(d_items_path)
Reads D_ITEMS.csv.gz and returns {itemid: label} for ~168 CHARTEVENTS
items matched via interpersonal keyword list from the paper's trust.ipynb.

MistrustNoncomplianceMIMIC3

input_schema = {"interpersonal_features": "sequence"}
output_schema = {"noncompliance": "binary"}
Label 1 if any NOTEEVENTS note contains "noncompliant", else 0
Base rate ≈ 0.88% in MIMIC-III v1.4

MistrustAutopsyMIMIC3

input_schema = {"interpersonal_features": "sequence"}
output_schema = {"autopsy_consent": "binary"}
Label 1 for autopsy consent (mistrust), 0 for decline (trust)
Admissions with both signals excluded as ambiguous
Black patients consent at ~39% vs ~26% for White (MIMIC-III v1.4)

Both tasks apply full feature normalisation from trust.ipynb cell 7
(restraint coarsening, bath categories, skip rules for pain subtypes).
Feature keys take the form "category||normalised_value" and are
tokenised automatically by PyHealth during set_task().

3. `pyhealth/tasks/init.py`

Exports MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3,
build_interpersonal_itemids.

4. `examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py` (new)

End-to-end pipeline reproducing the paper's classifier experiments:

MIMIC3Dataset(CHARTEVENTS + NOTEEVENTS)
→ build_interpersonal_itemids()
→ MistrustNoncomplianceMIMIC3 / MistrustAutopsyMIMIC3
→ LogisticRegression(l1_lambda=...)
→ Trainer → AUC-ROC

Includes --synthetic flag for smoke-test without PhysioNet access.

Expected results (MIMIC-III v1.4)

Task	n patients	Positive rate	AUC-ROC
Noncompliance	54,510	0.88%	0.667
Autopsy consent	1,009	26.8%	0.531

Usage

from pyhealth.datasets import MIMIC3Dataset
from pyhealth.tasks import (
    MistrustNoncomplianceMIMIC3,
    MistrustAutopsyMIMIC3,
    build_interpersonal_itemids,
)
from pyhealth.models import LogisticRegression
from pyhealth.trainer import Trainer

itemid_to_label = build_interpersonal_itemids("/path/to/D_ITEMS.csv.gz")

base_dataset = MIMIC3Dataset(
    root="/path/to/mimic-iii/1.4",
    tables=["CHARTEVENTS", "NOTEEVENTS"],
)
nc_dataset = base_dataset.set_task(
    MistrustNoncomplianceMIMIC3(itemid_to_label=itemid_to_label)
)

model = LogisticRegression(
    dataset=nc_dataset,
    l1_lambda=2.62e-4,   # equiv. sklearn C=0.1, n_train=38,157
)

trainer = Trainer(model=model)
trainer.train(train_dataloader=..., val_dataloader=..., epochs=50)

---
References

┌───────────────────┬─────────────────────────────────────────────┐
│     Resource      │                    Link                     │
├───────────────────┼─────────────────────────────────────────────┤
│ Paper (MLHC 2018) │ https://arxiv.org/abs/1808.03827            │
├───────────────────┼─────────────────────────────────────────────┤
│ Original code     │ https://github.com/wboag/eol-mistrust       │
├───────────────────┼─────────────────────────────────────────────┤

Complete implementation of the interpersonal-feature mistrust classifiers from "Racial Disparities and Mistrust in End-of-Life Care" (MLHC 2018, arXiv:1808.03827) using PyHealth. Consolidates all changes from: pr/uiuccs598dlh/logistic-regression/l1-regularization pr/uiuccs598dlh/mistrust-tasks/interpersonal-features-mimic3 pr/uiuccs598dlh/paper-pipeline/eol-mistrust-boag-2018 pyhealth/models/logistic_regression.py - Add l1_lambda (float, default 0.0) to LogisticRegression - In forward(): loss += l1_lambda * ||fc.weight||_1 when l1_lambda > 0 - Equivalent to sklearn LogisticRegression(penalty='l1', C=C) with l1_lambda = 1 / (C * n_train) pyhealth/tasks/mistrust_mimic3.py [new] - build_interpersonal_itemids(): reads D_ITEMS.csv.gz, returns {itemid: label} for ~168 interpersonal CHARTEVENTS items - MistrustNoncomplianceMIMIC3: sequence task predicting noncompliance label from NOTEEVENTS; base rate 0.88% in MIMIC-III v1.4 - MistrustAutopsyMIMIC3: sequence task predicting autopsy consent; ambiguous admissions excluded; Black consent rate 39% vs White 26% - Full feature normalisation from trust.ipynb cell 7 pyhealth/tasks/__init__.py - Export MistrustNoncomplianceMIMIC3, MistrustAutopsyMIMIC3, build_interpersonal_itemids examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py [new] - End-to-end pipeline: MIMIC3Dataset -> set_task -> LogisticRegression with L1 -> Trainer -> AUC-ROC evaluation - --synthetic flag for smoke-test without PhysioNet access - Expected AUC: noncompliance 0.667, autopsy 0.531 Co-Authored-By: Varun Tewari <vtewari2@illinois.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth#964

feat: Reproduce Boag et al. 2018 medical mistrust pipeline in PyHealth#964
vtewari2 wants to merge 1 commit intosunlabuiuc:masterfrom
vtewari2:pr/uiuccs598dlh/paper-pipeline-complete

vtewari2 commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vtewari2 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

1. pyhealth/models/logistic_regression.py

2. pyhealth/tasks/mistrust_mimic3.py (new)

3. pyhealth/tasks/__init__.py

4. examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py (new)

Expected results (MIMIC-III v1.4)

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vtewari2 commented Apr 12, 2026 •

edited

Loading

1. `pyhealth/models/logistic_regression.py`

2. `pyhealth/tasks/mistrust_mimic3.py` (new)

3. `pyhealth/tasks/init.py`

4. `examples/mistrust_prediction/mistrust_mimic3_logistic_regression.py` (new)