HEALTHCARE ML
Three complementary ML models exposing who's being left behind by healthcare access — and where it's heading.
A comprehensive machine learning analysis of healthcare access barriers across 75 demographic subgroups, spanning 2019–2025. Built three distinct models — predictive, time-series, and clustering — to answer different research questions from one unified dataset.
The Challenge
Healthcare access barriers disproportionately affect different demographic subgroups in ways that aren't visible from aggregate statistics. The DubsTech Datathon asked teams to use ML to uncover which populations are most at risk, where trends are heading, and which groups are falling through the cracks — all from a single dataset spanning 2019–2025.
The Solution
Instead of a single model, I built three complementary ML analyses: a supervised predictive model to score risk by subgroup, a time-series forecasting pipeline to project 2025 trends, and an unsupervised clustering and anomaly detection system to surface hidden at-risk groups. Together, they answer what's predictable, where we're heading, and who's being missed.
Predictive Model — 93.7% Accuracy
Trained 6 supervised algorithms (Linear, Ridge, Lasso Regression, Decision Trees, Random Forest, Gradient Boosting) on a 70/30 train-test split to predict cost barriers by demographic subgroup.
Time-Series Forecasting
Multi-model forecasting pipeline (ARIMA-style, Exponential Smoothing, Moving Average, Polynomial Regression) projecting 2025 healthcare barrier trends across all tracked categories.
Clustering & Anomaly Detection
Applied K-Means, Hierarchical, DBSCAN, Isolation Forest, and Local Outlier Factor with PCA dimensionality reduction to identify hidden at-risk subgroups not obvious from raw data.
Language
Python 3.10+
Data & ML
scikit-learn · pandas · numpy · scipy
Visualization
matplotlib · seaborn
Algorithms
Gradient Boosting · Random Forest · DBSCAN · Isolation Forest
The COVID Paradox
Barriers paradoxically decreased in 2020 — likely from expanded coverage and telehealth — but increased sharply post-2022 as policies rolled back. Aggregate trends masked subgroup-level divergence.
Mental Health Divergence
While medical and delayed-care barriers trended downward, mental health access was the only category still worsening. A single aggregate metric would have missed this entirely.
Clustering Over Averages
Unsupervised methods revealed at-risk groups (e.g., Native Hawaiian/Pacific Islander, bisexual individuals) that were statistically buried in aggregate datasets but faced barriers of 14–23%.
Three Models, One Story
Each model answered a distinct question. Combining them gave a complete picture: what's predictable, where trends are heading, and who's being missed.