Three Complementary ML Models on Healthcare Access Barriers
A comprehensive machine learning analysis of healthcare access barriers across 75 demographic subgroups, spanning 2019–2025. Built three distinct models — predictive, time-series, and clustering — to answer different research questions from one unified dataset.
Healthcare access barriers disproportionately affect different demographic subgroups in ways that aren't visible from aggregate statistics. The DubsTech Datathon asked teams to use ML to uncover which populations are most at risk, where trends are heading, and which groups are falling through the cracks — all from a single dataset spanning 2019–2025.
Instead of a single model, I built three complementary ML analyses: a supervised predictive model to score risk by subgroup, a time-series forecasting pipeline to project 2025 trends, and an unsupervised clustering and anomaly detection system to surface hidden at-risk groups. Together, they answer what's predictable, where we're heading, and who's being missed.
Trained 6 supervised algorithms (Linear, Ridge, Lasso Regression, Decision Trees, Random Forest, Gradient Boosting) on a 70/30 train-test split to predict cost barriers by demographic subgroup.
Multi-model forecasting pipeline (ARIMA-style, Exponential Smoothing, Moving Average, Polynomial Regression) projecting 2025 healthcare barrier trends across all tracked categories.
Applied K-Means, Hierarchical, DBSCAN, Isolation Forest, and Local Outlier Factor with PCA dimensionality reduction to identify hidden at-risk subgroups not obvious from raw data.
Barriers paradoxically decreased in 2020 — likely from expanded coverage and telehealth — but increased sharply post-2022 as policies rolled back. Aggregate trends masked subgroup-level divergence.
While medical and delayed-care barriers trended downward, mental health access was the only category still worsening. A single aggregate metric would have missed this entirely.
Unsupervised methods revealed at-risk groups (e.g., Native Hawaiian/Pacific Islander, bisexual individuals) that were statistically buried in aggregate datasets but faced barriers of 14–23%.
Each model answered a distinct question. Combining them gave a complete picture: what's predictable (Model 1), where trends are heading (Model 2), and who's being missed (Model 3).
I'm always open to discussing data science projects, internship opportunities, or healthcare technology initiatives.