NLP sentiment analysis of restaurant reviews: In this project, a student analyzed a dataset of thousands of restaurant reviews to determine the sentiment (positive or negative) expressed in each review. They trained an NLP model like BERT to classify each review as expressing positive or negative sentiment based on the words used. This type of sentiment analysis has applications in determining customer satisfaction.
Predicting bike rentals using weather and calendar data: For this project, a student used historical bike rental data along with associated weather and calendar features (holidays, day of week, etc.) to build and evaluate several regression models for predicting the number of bike rentals on a given day. Features like temperature, precipitation and whether it was a weekday significantly improved the models’ ability to forecast demand. The models could help bike rental companies plan fleet sizes.
Predicting credit card fraud: Using a dataset of credit card transactions labeled as fraudulent or legitimate, a student developed and optimized machine learning classifiers like random forests and neural networks to identify transactions that have a high likelihood of being credit card fraud. Features included transaction amounts, locations, and other attributes. Financial institutions could deploy similar models to automatically flag potentially fraudulent transactions in real-time.
Predicting student performance: A student collected datasets containing student demographics, test scores, course grades and other academic performance indicators. Several classification and regression techniques were trained and evaluated on their ability to predict a student’s final grade in a course based on these factors. Factors like standardized test scores, number of absences and previous GPA significantly improved predictions. Such models could help identify students who may need additional support.
Diagnosing pneumonia from chest X-rays: In this project, a student analyzed a large dataset of chest X-ray images that were manually labeled by radiologists as either having signs of pneumonia or being healthy. Using techniques like convolutional neural networks, they developed models that could automatically analyze new chest X-rays and classify them as showing pneumonia or being normal with a high degree of accuracy. This type of diagnostic application using deep learning has real potential to help clinicians.
Predicting housing prices: A student collected data on properties sold in a city including features like number of bedrooms, bathrooms, lot size, age and neighborhood. They developed and compared regression models trained on this data to predict future housing sale prices based on property attributes. Factors like number of bathrooms and lot size significantly impacted prices. Real estate agents could use similar models to estimate prices when listing new homes.
Recommending movies on Netflix: Using Netflix’s anonymized movie rating dataset, a student built collaborative filtering models to predict rating scores for movies that a user has not yet seen based on their ratings history and the ratings from similar users. Evaluation metrics showed the models could reasonably recommend new movies a user might enjoy based on their past preferences and preferences of users with similar tastes. This type of recommendation system is at the core of how Netflix and other platforms suggest new content.
Predicting flight delays: For their project, a student assembled datasets containing flight records along with associated details like weather at origin/destination airports, aircraft type and airline. Several classification algorithms were developed and evaluated on their ability to predict whether a flight will be delayed based on these features. Factors like temperature inversions, crosswinds and aircraft type significantly impacted delays. Airlines could potentially use such models operationally to plan for and mitigate delays.
Predicting diabetes: Using medical datasets containing biometric/exam results of patients together with diagnoses of whether they had diabetes or not, a student developed and optimized machine learning classification models to identify undiagnosed diabetes cases based on these risk factor features. Features with the highest predictive value included BMI, glucose levels, blood pressure and family history of diabetes. Physicians could potentially deploy or consider similar models to help screen patients and supplement their clinical decision making.
As demonstrated through these examples, machine learning capstone projects provide students opportunities to work on real-world applications of their skills and knowledge. Some key benefits of these types of projects include: gaining hands-on experience applying machine learning techniques to solve problems, developing skill in data preparation, feature engineering, model development/evaluation and interpretation. They also help students demonstrate their abilities to potential employers or for further academic studies. Capstone projects are an ideal way for students to showcase what they’ve learned while working on meaningful problems.