In this post I’ll be discussing my final project for Biostatistics 823: Statistical Programming for Big Data, a course I took at Duke University. For this project, my classmates and I built an interactive dashboard to explore ICU admission trends at Beth Isreal Deaconess Medical center. My teammates on this project were Alena Kalodzitsa, Felipe Buchbinder, and Chenxi Wu. This project would not have been possible without them.

MIMIC-III Dataset

The dataset we used is called MIMIC-III. This database contains over 60,000 ICU patient records collected from 2001-2012 at Beth Isreal Deaconess Medical Center. The data was compiled by the MIT Lab for Computational Physiology and preprocessed to protect patient identity.

Project Goals

The overarching goal of our project was to examine two separate questions: do certain diseases tend to occur together and how does disease prevalence vary across demographic groups. Our target audience was hospital administrators and physicians, as they could use this information to change how their hospital responds to the needs of their community. For example, if it is found that most severe illnesses tend to occur with a preventable underlying condition (such as obesity, smoking, or mismanaged diabetes), a program could be developed to help the local community take ownership of their own health. These efforts could be further targeted by examining which demographic groups tend to experience a given health problem.

Project Flowchart

We began by launching a prebuilt AWS stack that allowed us to access the MIMIC data from an AWS account. We queried the data using Amazon Athena and stored the results in an S3 bucket. Then we read the data into our Streamlit application using Pandas.

Dashboard Components

In order to answer the two questions we are interested in, we separated our dashboard into 5 components: general trends, disease to demographics, demographics to disease, a market basket analysis, and a co-occurrence analysis.

This section provides information on the most common diseases, patient demographic distributions, and hospital admission locations.

Disease to Demographics

This portion of the dashboard allows you to select a disease and see typical patient characteristics.

Demographics to Disease

The opposite of disease to demographics, demographics to disease lets you input demographic information and see what kinds of diseases people with those characteristics suffer from.

Market Basket Analysis

This graph shows the results of a market basket analysis of diseases. Specifically, on display are the combinations of diseases that have the highest lift values. The lift value tells you if the odds of two diseases happening together is higher than if the diseases occurred entirely independently. High lift values indicate a strong relationship between two disease categories.

Co-occurrence Analysis

Here you can select a disease and view a network of conditions that people with this disease have. This can be used to identify different groups of individuals who all have the same diagnosis. The co-occurrence matrix used here was generated using the Natural Language Toolkit (NLTK) package. The nodes and edges of the graph were constructed using NetworkX.

Sample Insights and Conclusion

While we were building the dashboard we came up with some sample insights to illustrate its potential value. Here’s some of what we found:

  • The majority of patients in the ICU have some form of hypertension.
    • Hypertension is disproportionately present in Black patients.
  • Asians are underrepresented compared to the Asian population where the hospital is located (2.5% to 7% in Boston, MA).
  • Asian patients tend to be young women who are having children.
  • Only 49% of patients are admitted to the ICU from the emergency room.
    • Many patients are coming from long-term care facilities.

Overall, our dashboard could serve as a useful tool to physicians and hospital administrators. It can be used to guide potential community-health interventions and monitor the health of various demographic groups. If you’re interested in looking at the dashboard feel free to check out the code here.

Technologies Used

screen shot 2017-08-07 at 12 18 15 pm screen shot 2017-08-07 at 12 18 15 pm screen shot 2017-08-07 at 12 18 15 pm
screen shot 2017-08-07 at 12 18 15 pm screen shot 2017-08-07 at 12 18 15 pm screen shot 2017-08-07 at 12 18 15 pm
screen shot 2017-08-07 at 12 18 15 pm screen shot 2017-08-07 at 12 18 15 pm