Hi, I'm

Riya Shet

MSc student applying machine learning to healthcare operations

Currently seeking dissertation supervisor for research in ML-based healthcare workflow optimization

I'm an MSc Health Data Science student at the University of Birmingham, working at the intersection of machine learning and healthcare operations.

My research focuses on using EHR data and predictive models to optimize clinical workflows, improve resource allocation, and support evidence-based healthcare decisions. I'm particularly interested in how machine learning can solve practical operational challenges in healthcare settings.

I work across multiple domains — from healthcare finance to genomics to population health — demonstrating versatility in both Python-based ML pipelines and R-based statistical analysis.

Current Focus

Completing MSc coursework and developing dissertation proposal in ML applications to healthcare operations

Location

Dubai, UAE

Institution

University of Birmingham
MSc Health Data Science

Interests

  • ML for Healthcare Operations
  • EHR-based Prediction Systems
  • Statistical Genomics
  • Clinical Decision Support

Healthcare Revenue Cycle Risk Prediction

Lead Project

ML framework predicting patient-level financial risk using synthetic EHR data. Achieved 9.4× improvement over baseline in identifying high-risk patients through Random Forest classification and permutation-based feature importance analysis.

Key Finding

Comorbidity burden emerged as sole reliable predictor, while assumed socioeconomic factors showed minimal contribution — suggests ML-based scoring can use existing clinical documentation without additional data collection

Methods

Random Forest, stratified cross-validation, imbalanced learning, synthetic data generation (Synthea)

Stack

Python, scikit-learn, pandas, matplotlib

Differential Expression Analysis of Tumor Tissue

Statistical Analysis

Statistical analysis of RNA-seq data identifying differentially expressed genes between tumor and normal tissue. Demonstrates rigorous bioinformatics workflow with quality control, negative binomial regression, and multiple testing correction.

Methods

PCA revealed 30.8% variance in PC1 separating tumor from normal. Negative binomial regression selected over Poisson based on observed overdispersion. Benjamini-Hochberg FDR correction applied.

Stack

R, tidyverse, MASS, pheatmap

UAE Workforce Diabetes Risk Survey Protocol

Study Design

Cross-sectional study protocol examining digital health literacy (eHEALS) and diabetes risk (FINDRISC) relationships. Demonstrates research design capabilities including validated instrument adaptation, comprehensive codebook, and power analysis.

Protocol design coursework; data collection not performed

Languages

Python, R, SQL

ML & Statistics

Supervised learning, regression modeling, experimental design, hypothesis testing, feature engineering, model validation

Libraries & Tools

scikit-learn, pandas, tidyverse, ggplot2, Git/GitHub, Jupyter, RStudio

Domain Expertise

EHR data analysis, genomic data, synthetic data generation, healthcare operations, clinical informatics

Open to discussions about dissertation supervision, research collaborations, or opportunities in health data science.

GitHub @riyashet-hds LinkedIn riyashet