Step-by-step causal analysis of EHRs to ground decision-making.

Causal inference enables machine learning methods to estimate treatment effects of medical interventions from electronic health records (EHRs). The prevalence of such observational data and the difficulty for randomized controlled trials (RCT) to cover all population/treatment relationships make the...

Full description

Saved in:
Bibliographic Details
Main Authors: Matthieu Doutreligne, Tristan Struja, Judith Abecassis, Claire Morgand, Leo Anthony Celi, Gaël Varoquaux
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-02-01
Series:PLOS Digital Health
Online Access:https://doi.org/10.1371/journal.pdig.0000721
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823864128091979776
author Matthieu Doutreligne
Tristan Struja
Judith Abecassis
Claire Morgand
Leo Anthony Celi
Gaël Varoquaux
author_facet Matthieu Doutreligne
Tristan Struja
Judith Abecassis
Claire Morgand
Leo Anthony Celi
Gaël Varoquaux
author_sort Matthieu Doutreligne
collection DOAJ
description Causal inference enables machine learning methods to estimate treatment effects of medical interventions from electronic health records (EHRs). The prevalence of such observational data and the difficulty for randomized controlled trials (RCT) to cover all population/treatment relationships make these methods increasingly attractive for studying causal effects. However, researchers should be wary of many pitfalls. We propose and illustrate a framework for causal inference estimating the effect of albumin on mortality in sepsis using an Intensive Care database (MIMIC-IV) and comparing various sensitivity analyses to results from RCTs as gold-standard. The first step is study design, using the target trial concept and the PICOT framework: Population (patients with sepsis), Intervention (combination of crystalloids and albumin for fluid resuscitation), Control (crystalloids only), Outcome (28-day mortality), Time (intervention start within 24h of admission). We show that too large treatment-initiation times induce immortal time bias. The second step is selection of the confounding variables based on expert knowledge. Increasingly adding confounders enables to recover the RCT results from observational data. As the third step, we assess the influence of multiple models with varying assumptions, showing that a doubly robust estimator (AIPW) with random forests proved to be the most reliable estimator. Results show that these steps are all important for valid causal estimates. A valid causal model can then be used to individualize decision making: subgroup analyses showed that treatment efficacy of albumin was better for patients >60 years old, males, and patients with septic shock. Without causal thinking, machine learning is not enough for optimal clinical decision on an individual patient level. Our step-by-step analytic framework helps avoiding many pitfalls of applying machine learning to EHR data, building models that avoid shortcuts and extract the best decision-making evidence.
format Article
id doaj-art-ee120ab0a39242ff8fe0e0e7271abc7d
institution Kabale University
issn 2767-3170
language English
publishDate 2025-02-01
publisher Public Library of Science (PLoS)
record_format Article
series PLOS Digital Health
spelling doaj-art-ee120ab0a39242ff8fe0e0e7271abc7d2025-02-09T05:30:53ZengPublic Library of Science (PLoS)PLOS Digital Health2767-31702025-02-0142e000072110.1371/journal.pdig.0000721Step-by-step causal analysis of EHRs to ground decision-making.Matthieu DoutreligneTristan StrujaJudith AbecassisClaire MorgandLeo Anthony CeliGaël VaroquauxCausal inference enables machine learning methods to estimate treatment effects of medical interventions from electronic health records (EHRs). The prevalence of such observational data and the difficulty for randomized controlled trials (RCT) to cover all population/treatment relationships make these methods increasingly attractive for studying causal effects. However, researchers should be wary of many pitfalls. We propose and illustrate a framework for causal inference estimating the effect of albumin on mortality in sepsis using an Intensive Care database (MIMIC-IV) and comparing various sensitivity analyses to results from RCTs as gold-standard. The first step is study design, using the target trial concept and the PICOT framework: Population (patients with sepsis), Intervention (combination of crystalloids and albumin for fluid resuscitation), Control (crystalloids only), Outcome (28-day mortality), Time (intervention start within 24h of admission). We show that too large treatment-initiation times induce immortal time bias. The second step is selection of the confounding variables based on expert knowledge. Increasingly adding confounders enables to recover the RCT results from observational data. As the third step, we assess the influence of multiple models with varying assumptions, showing that a doubly robust estimator (AIPW) with random forests proved to be the most reliable estimator. Results show that these steps are all important for valid causal estimates. A valid causal model can then be used to individualize decision making: subgroup analyses showed that treatment efficacy of albumin was better for patients >60 years old, males, and patients with septic shock. Without causal thinking, machine learning is not enough for optimal clinical decision on an individual patient level. Our step-by-step analytic framework helps avoiding many pitfalls of applying machine learning to EHR data, building models that avoid shortcuts and extract the best decision-making evidence.https://doi.org/10.1371/journal.pdig.0000721
spellingShingle Matthieu Doutreligne
Tristan Struja
Judith Abecassis
Claire Morgand
Leo Anthony Celi
Gaël Varoquaux
Step-by-step causal analysis of EHRs to ground decision-making.
PLOS Digital Health
title Step-by-step causal analysis of EHRs to ground decision-making.
title_full Step-by-step causal analysis of EHRs to ground decision-making.
title_fullStr Step-by-step causal analysis of EHRs to ground decision-making.
title_full_unstemmed Step-by-step causal analysis of EHRs to ground decision-making.
title_short Step-by-step causal analysis of EHRs to ground decision-making.
title_sort step by step causal analysis of ehrs to ground decision making
url https://doi.org/10.1371/journal.pdig.0000721
work_keys_str_mv AT matthieudoutreligne stepbystepcausalanalysisofehrstogrounddecisionmaking
AT tristanstruja stepbystepcausalanalysisofehrstogrounddecisionmaking
AT judithabecassis stepbystepcausalanalysisofehrstogrounddecisionmaking
AT clairemorgand stepbystepcausalanalysisofehrstogrounddecisionmaking
AT leoanthonyceli stepbystepcausalanalysisofehrstogrounddecisionmaking
AT gaelvaroquaux stepbystepcausalanalysisofehrstogrounddecisionmaking