Riya Shet

About

I'm an MSc Health Data Science student at the University of Birmingham, working at the intersection of machine learning and healthcare operations.

My research focuses on using EHR data and predictive models to optimize clinical workflows, improve resource allocation, and support evidence-based healthcare decisions. I'm particularly interested in how machine learning can solve practical operational challenges in healthcare settings.

I work across multiple domains — from healthcare finance to genomics to population health — demonstrating versatility in both Python-based ML pipelines and R-based statistical analysis.

Current Focus

Completing MSc coursework and developing dissertation proposal in ML applications to healthcare operations

Location

Dubai, UAE

Institution

University of Birmingham
MSc Health Data Science

Interests

ML for Healthcare Operations
EHR-based Prediction Systems
Statistical Genomics
Clinical Decision Support

Selected Work

Healthcare Revenue Cycle Risk Prediction

Lead Project

ML framework predicting patient-level financial risk using synthetic EHR data. Achieved 9.4× improvement over baseline in identifying high-risk patients through Random Forest classification and permutation-based feature importance analysis.

Key Finding

Comorbidity burden emerged as sole reliable predictor, while assumed socioeconomic factors showed minimal contribution — suggests ML-based scoring can use existing clinical documentation without additional data collection

Methods

Random Forest, stratified cross-validation, imbalanced learning, synthetic data generation (Synthea)

Stack

Python, scikit-learn, pandas, matplotlib

View Repository → Documentation →

Differential Expression Analysis of Tumor Tissue

Statistical Analysis

Statistical analysis of RNA-seq data identifying differentially expressed genes between tumor and normal tissue. Demonstrates rigorous bioinformatics workflow with quality control, negative binomial regression, and multiple testing correction.

Methods

PCA revealed 30.8% variance in PC1 separating tumor from normal. Negative binomial regression selected over Poisson based on observed overdispersion. Benjamini-Hochberg FDR correction applied.

Stack

R, tidyverse, MASS, pheatmap

View Repository →

UAE Workforce Diabetes Risk Survey Protocol

Study Design

Cross-sectional study protocol examining digital health literacy (eHEALS) and diabetes risk (FINDRISC) relationships. Demonstrates research design capabilities including validated instrument adaptation, comprehensive codebook, and power analysis.

Protocol design coursework; data collection not performed

View Repository →

Technical Capabilities

Languages

Python, R, SQL

ML & Statistics

Supervised learning, regression modeling, experimental design, hypothesis testing, feature engineering, model validation

Libraries & Tools

scikit-learn, pandas, tidyverse, ggplot2, Git/GitHub, Jupyter, RStudio

Domain Expertise

EHR data analysis, genomic data, synthetic data generation, healthcare operations, clinical informatics

Get in Touch

Open to discussions about dissertation supervision, research collaborations, or opportunities in health data science.

GitHub @riyashet-hds LinkedIn riyashet

Institution University of Birmingham

Location Dubai, UAE