by Weissman, Gary E.;
Hubbard, Rebecca A.; Ungar, Lyle H.; Harhay, Michael O.; Greene, Casey S.;
Himes, Blanca E.; Halpern, Scott D.
Objectives:
Early prediction of undesired outcomes among newly hospitalized patients could
improve patient triage and prompt conversations about patients’ goals of care.
We evaluated the performance of logistic regression, gradient boosting machine,
random forest, and elastic net regression models, with and without unstructured
clinical text data, to predict a binary composite outcome of in-hospital death
or ICU length of stay greater than or equal to 7 days using data from the first
48 hours of hospitalization. Design: Retrospective cohort study with split
sampling for model training and testing. Setting: A single urban academic
hospital. Patients: All hospitalized patients who required ICU care at the Beth
Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012.
Interventions: None. Measurements and Main Results: Among eligible 25,947
hospital admissions, we observed 5,504 (21.2%) in which patients died or had
ICU length of stay greater than or equal to 7 days. The gradient boosting
machine model had the highest discrimination without (area under the receiver
operating characteristic curve, 0.83; 95% CI, 0.81–0.84) and with (area under
the receiver operating characteristic curve, 0.89; 95% CI, 0.88–0.90)
text-derived variables. Both gradient boosting machines and random forests
outperformed logistic regression without text data (p < 0.001), whereas all
models outperformed logistic regression with text data (p < 0.02). The
inclusion of text data increased the discrimination of all four model types (p
< 0.001). Among those models using text data, the increasing presence of
terms “intubated” and “poor prognosis” were positively associated with
mortality and ICU length of stay, whereas the term “extubated” was inversely
associated with them. Conclusions: Variables extracted from unstructured
clinical text from the first 48 hours of hospital admission using natural language
processing techniques significantly improved the abilities of logistic
regression and other machine learning models to predict which patients died or
had long ICU stays. Learning health systems may adapt such models using
open-source approaches to capture local variation in care patterns.
No comments:
Post a Comment