Healthcare Revenue Cycle Risk Prediction
Lead ProjectML framework predicting patient-level financial risk using synthetic EHR data. Achieved 9.4× improvement over baseline in identifying high-risk patients through Random Forest classification and permutation-based feature importance analysis.
Comorbidity burden emerged as sole reliable predictor, while assumed socioeconomic factors showed minimal contribution — suggests ML-based scoring can use existing clinical documentation without additional data collection
Random Forest, stratified cross-validation, imbalanced learning, synthetic data generation (Synthea)
Python, scikit-learn, pandas, matplotlib