Ayaan Omair Portfolio

Projects

Disease Prediction Model

Built logistic regression and decision tree models in Python to predict Heart Disease, Kidney Disease, and Skin Cancer using patient health data. Achieved up to 75% accuracy and AUC scores as high as 0.84, demonstrating strong model performance. Identified challenges due to class imbalance, with models exhibiting high recall (e.g., 78%) but low precision for positive cases. Evaluated performance using F1 score and AUC, and validated model generalizability by confirming no overfitting through consistent training and testing accuracy

Player Performance Dashboard (Internship)

Completed an internship focused on developing a player performance dashboard using Python and Excel, implementing a full ETL process to extract, clean, and organize performance data. Extracted, combined, and processed over 500 Excel and CSV files—each representing individual training sessions or matches—collected via wearable Playermaker devices. These devices tracked key metrics such as sprint speed, acceleration, and foot usage. Cleaned and transformed/filtered the dataset in Python to generate 27 individual CSV files, each corresponding to a specific player’s performance data. Designed a user-friendly Excel dashboard to visualize this information and collaborated with coaching staff to align the tool with their decision-making needs.

Loan Default Prediction

Developed a logistic regression model in Python to predict loan default using LendingClub data. To address class imbalance, implemented various resampling techniques including SMOTE, ADASYN, RandomOverSampler, and RandomUnderSampler. Performed data preprocessing and model evaluation to compare the effectiveness of these strategies, achieving up to 70% recall, 0.70 AUC, and a 0.35 F1-score on unseen test data, demonstrating the utility of advanced sampling strategies in improving predictive accuracy for rare events.

Grammys Project

Worked with real-world web traffic data from grammy.com and recordingacademy.com to evaluate the impact of The Recording Academy's decision to split its main website into two distinct platforms. Using Python and Pandas, I conducted an in-depth analysis to uncover differences in user behavior, traffic trends, and engagement across both sites. Key insights were visualized using Plotly Express library to support data-driven conclusions and identify how the split affected audience interaction

Traffic Collisions in California

Used SQL to analyze collision data from the California Statewide Integrated Traffic Records System (SWITRS), focusing on accidents involving alcohol impairment or driver inattention (e.g., texting or phone use). The goal was to identify patterns in the occurrence of these accidents, including when they are most likely to happen. Findings were visualized using Tableau to effectively communicate insights and trends.

Arrival Delay Prediction Model

Developed a single-variable regression model in SAS to predict flight arrival delays at Houston’s IAH airport, using data from the month of October. The project involved analyzing five independent variables related to delays, such as weather, carrier issues, and late aircraft. Various plots and statistical summaries were generated to explore relationships between variables. The analysis revealed that carrier delay had the strongest impact on overall arrival delay times, providing key insights into operational inefficiencies

Olympics Medalist

Utilized Python to explore a comprehensive dataset of Olympic medalists from 1896 to 2016. Conducted data cleaning, analysis, and visualization to answer key questions such as: Who are the youngest and oldest medalists of all time? Are there physical differences between Summer and Winter Olympic medalists? Which country has won the most medals overall? This project demonstrates the use of Python for historical data exploration and insights into global athletic performance trends.

Prediction Model for team wins (MLB)

Utilized SAS to develop predictive models for the number of wins by the Toronto Blue Jays. Built and evaluated multiple linear regression models to determine the best combinations of variables. The most effective two-variable model included hits and batting average, while the top three-variable model incorporated hits, batting average, and RBIs as predictors of team wins

Bay Wheels User Analysis

After acquiring the GoBike program from Ford, Lyft aimed to boost memberships by understanding how users interact with the service. Using SQL, I analyzed user behavior by joining and querying multiple datasets to compare patterns between former Ford GoBike users and current Lyft users. This analysis helped uncover key differences in usage trends to inform business strategy.

Covid-19 Cases and Deaths Analysis

Used Linux and awk to analyze COVID-19 case and death data in Arizona from 2020 through February 2022. This involved calculating daily new cases and deaths, as well as computing weekly and monthly averages using awk. I then visualized these trends in MATLAB, highlighting fluctuations over time.

About Me

Hi, I’m Ayaan Omair, a Master’s student in Data Science at Texas A&M University, with a strong foundation in statistics and analytics. I recently earned my Bachelor’s degree in Mathematics (Statistics) from Arizona State University, graduating summa cum laude with a 3.90 GPA.

During my undergraduate studies, I built a solid skill set in data analysis, working with tools such as Python, SQL, and R. I’ve applied these tools across a range of personal and professional projects, developing expertise in data cleaning, visualization, data preprocessing, data wrangling, and exploratory data analysis (EDA).

I also have hands-on experience with a variety of machine learning techniques, including k-means clustering, Gaussian mixture models, decision trees, logistic regression, as well as linear, lasso, and ridge regression. When I’m not working with data, you’ll probably find me watching sports. I’m a huge fan of the Green Bay Packers, Phoenix Suns, and Manchester United.

Ayaan Omair

"Don't compare your life to others.
There is no comparison between the sun and the moon.
They shine when it's their time" - Unknown

Projects

Disease Prediction Model

Player Performance Dashboard (Internship)

Loan Default Prediction

Grammys Project

Traffic Collisions in California

Arrival Delay Prediction Model

Olympics Medalist

Prediction Model for team wins (MLB)

Bay Wheels User Analysis

Covid-19 Cases and Deaths Analysis

About Me

Resume

Click here to open resume in new window

Certificates

Skills

LINUX

BASH

My Inspiration

Aaron Rodgers

Jordan Love

Marcus Rashford

Josh Jacobs

Stephen Curry

Devin Booker

Kevin Durant

Contact

Ayaan Omair

"Don't compare your life to others.There is no comparison between the sun and the moon. They shine when it's their time" - Unknown

Projects

Disease Prediction Model

Player Performance Dashboard (Internship)

Loan Default Prediction

Grammys Project

Traffic Collisions in California

Arrival Delay Prediction Model

Olympics Medalist

Prediction Model for team wins (MLB)

Bay Wheels User Analysis

Covid-19 Cases and Deaths Analysis

About Me

Resume

Click here to open resume in new window

Certificates

Skills

LINUX

BASH

My Inspiration

Aaron Rodgers

Jordan Love

Marcus Rashford

Josh Jacobs

Stephen Curry

Devin Booker

Kevin Durant

Contact

"Don't compare your life to others.
There is no comparison between the sun and the moon.
They shine when it's their time" - Unknown