Data Science Portfolio
A sample of some recent data science projects
NASDAQ Analysis and Stock Price Prediction
As most people could guess, the stock market is unstable and often unpredictable. For a long time, financial researchers have investigated if time-series data can predict future market trends. As expected, this is very challenging given fluctuations in the market. However, data science approaches such as exploratory data analysis, variance analysis, and machine learning models can provide valuable information for analyzing stock market trends.
Through this project, I explored NASDAQ data, analyzed variability, and built several machine learning models to predict future stock price. Using an exploratory data analysis approach, I synthesized thousands of stock prices into digestible information. I also leveraged PCA to confirm that events such as COVID-19 caused disruptions to stock prices. Lastly, I was also able to predict if a stock price increased or decreased from the start to the end of the month with marginal accuracy, which could provide traders with relevant information.
The data from this project is available on Kaggle (https://www.kaggle.com/datasets/jacksoncrow/stock-market-dataset/data)
Image Credit: nasdaq.com
Heart Disease Prediction
The estimated prevalence of heart disease in the United States is about 7% as of 2023, costing the country north of $200 billion annually. The University of California Irvine has a dataset from 4 sources (Cleavland, Hungary, Switzerland, and VA Long Beach) with features related to related to age, sex, chest pain, blood pressure, cholesterol level, and more. Predicting if an individual will have heart disease using variables screened for at regular doctors appointments could provide valuable information for early detection of risk of heart disease.
In this project, I tested several supervised machine learning models (KNN, SVM, decision trees, logistic regression, etc.) to predict if an individual has or does not have heart disease (i.e., binary classification). The best performing model resulted in an acccuracy of 85%. I used exploratory data analysis to visualize the dataset and encode categorical variables prior to ML modeling. The results of these models show that supervised ML can inform doctors of patients risk of heart disease so they can intervene before diseases become severe.
The data from this project is available on Kaggle (https://www.kaggle.com/datasets/redwankarimsony/heart-disease-data)
Image Credit: UCI ML Reporisory, freepik.com
Gait Speed Classification using IMUs and Deep Learning
Gait speed (i.e., the time it takes to ambulate over a certain distance) is an important predictor of aging, pathology, or injury. As people age or suffer from a musculoskeletal disorder, they tend to walk slower. Typically, gait speed is collected in a laboratory environment using timing gaits and other similar equipment. Wearable sensors such as inertial measurement units (IMUs), offer an alternative option for recording and analyzing gait outside of a laboratory environment. Additionally, classifying gait speed using wearable sensors could be a viable option for longitudinally monitoring gait speed.
In this project, I explored an open source dataset from Miraldo et al. of IMU signals to classify gait speed using machine learning. I was able to classify gait speed with over 90% accuracy using angular velocity signals from two IMUs on the right shank. On my github, there are notebooks walking through the initial exploratory data analysis, pre-processing data, building machine learning models, and a deep learning approach. The readme file has instructions on how to pre-process the data and train/test the deep learning model.
Open source data available at https://doi.org/10.6084/m9.figshare.7778255.v3
Abnormal Heartbeat Detection using Deep Learning
Image credit: https://www.istockphoto.com/photos/normal-heart-rhythm