The capstone project is the final assessment for the Google Data Analytics Certificate program. It provides students the opportunity to demonstrate the skills and knowledge they have gained throughout the six courses by completing an end-to-end data analytics project on a topic of their choosing.
To start the capstone project, students will need to choose a real-world dataset and formulate a question they want to answer using data analytics. The dataset can be from an open source database, their own collection, or publicly available from the internet. It is recommended students select a topic they are personally interested in to stay motivated throughout the lengthy capstone project.
Once a dataset and question are chosen, students then begin the multi-step capstone project process. The first step is to discover and understand the data through exploratory data analysis techniques learned in the Exploratory Data Analysis course. This involves loading the data, assessing its quality, dealing with missing values, identifying patterns and relationships, and visualizing the data to gain insights. A short document summarizing the key findings from exploratory analysis is produced.
With a better understanding of the data, students then move to the next step of defining the problem more concretely. Here, they will state the business problem or research question more specifically based on exploratory findings. Well-defined questions help scope the rest of the capstone project work. Students may need to return to exploratory analysis with a revised question as understanding improves.
In the third step, students collect any additional data required to answer their question. This could involve web scraping, APIs, or combining external datasets. They document the sources and process for collecting additional data in a reproducible manner.
Armed with the question and collected data, students then build machine learning models to help answer their question in the predictive modeling step. They apply techniques from the Machine Learning course to prepare the data, select algorithms, tune parameters, evaluate performance and compare results. Graphs and discussion justify their modeling selections and parameter tuning decisions.
Next, students interpret the results of their predictive modeling and provide conclusions to their original question based on facts and evidence from their analysis. They discuss whether analysis supported or refuted hypotheses, identify limitations or caveats in conclusions due to limitations in data or modeling assumptions. Potential next steps for additional analysis are also proposed.
Throughout the process, clear documentation and code are essential. Students produce Jupyter notebooks to display each step – from data wrangling to visualizations to modeling. Notebooks should have explanatory comments and be well structured/modularized for clarity.
Students also produce a short paper summarizing their overall process and findings. This paper ties together the problem motivation, data understanding, methodology, results and conclusions. Visuals from the notebooks can be referenced. Proper writing fundamentals are expected regarding structure, grammar and effective communication of technical concepts for a lay audience.
Once complete, students submit their Jupyter notebooks containing code and visuals, along with the short summary paper for evaluation. Instructors assess a variety of factors including choice of problem/dataset, quality of analysis conducted at each step, documentation/notebooks, conclusions drawn, and communication of findings. Feedback is then provided to help students continue developing their skills.
Through this comprehensive capstone experience, students demonstrate the cumulative abilities and competencies expected of any data analyst. Namely – identifying meaningful problems, acquiring and cleansing relevant data, applying analytical tools and techniques, effectively communicating results and implications. It serves as a practical culminating project showcasing skills gained in the entire Google Data Analytics Certificate program.
The capstone project provides a structured yet open-ended process for students to combine all their learning into a complete data analytics workflow to solve a real problem. Though challenging, it equips them with project experience highly valuable for employment as practiced data professionals. Proper execution of this capstone is essential for mastering core competencies of the data analyst role.