One of the biggest challenges is accessing the required data sources. Students have to identify relevant sources of data for their research questions and then find a way to collect the needed data from those sources. This can be difficult for several reasons. Some potential data sources may be unwilling or unable to share data due to privacy or confidentiality policies. Important data may also be behind paywalls or not publically available. Students need to reach out to potential data providers well in advance to request data and be prepared with Institutional Review Board approvals if needed. They should also have alternative data sources in mind in case Plan A doesn’t work out.
Related to data access is not having the right permissions or clearances to collect certain types of data. For instance, students may need IRB approval from their university to collect data involving human subjects. Or they may need special access permissions to obtain restricted government or commercial datasets. The permissions process can take time, so students need to initiate it as early as possible in the project planning stages. They also need to understand what types of data collection methods do or don’t require extra approvals.
Data quality can also pose issues that impact the analysis. Some common data quality problems students may encounter include missing or incomplete records, inconsistencies in data formats, errors or outliers in the values, and outdated or obsolete information. Students should review any data they obtain early on for these types of quality problems and be prepared to clean the data before use. They also need to understand that some types of poor quality data may be unsuitable for their research and require finding an alternative source.
Time constraints are another frequent challenge for capstone students when it comes to data gathering. Pulling together large or complex datasets from multiple sources can be very time intensive. Also, it may take longer than expected to gain required permissions or access to some datasets. Any delays mean students have less time to analyze the data, which puts them at risk of not finishing their project as planned. To help mitigate this risk, students need to finalize their data needs as early as possible and start the collection process well ahead of when they realistically need the data. Temporary data sources can also serve as backups in case primary sources are delayed.
Limited skills, experience or resources can hinder data collection efforts. Students aren’t always fully prepared to carry out specialized data collection methods that may be required for their project. For example, they may lack expertise in survey design, sampling approaches, data programming scripts, or use of specialized tools. Budget constraints may also prevent them from purchasing commercial data or hiring outside help for complex collections. To overcome these obstacles, students need to learn skills through supplemental coursework, online resources or mentorship well in advance of starting their project. They may also choose slightly less complex data collection approaches that better match their current abilities.
One of the most persistent challenges is collecting enough data to power robust statistical analyses and produce meaningful insights. Capstone projects often involve limited sample sizes due small budgets, restricted timeframes or difficulty recruiting participants. This poses the risk of datasets being too small to fully address research questions or generalized conclusions through inferential statistics. Students can mitigate this risk through pilot testing to better predict required sample sizes, focusing research on cases where sufficient data is readily available, using secondary data sources to increase data volume, and setting realistic expectations around study power based on projected dataset sizes.
While data gathering can present substantial obstacles for student capstone projects, thorough planning, skill development, contingency strategies and initiating the process early are effective ways to overcome many common challenges. With diligent preparation, alternative options and flexibility built into their plans, students can greatly improve their chances of acquiring quality datasets suitable for analysis within project timelines and constraints. The data collection phase requires significant front loading work from capstone students, but those who are well organized and proactively address potential barriers will be far likelier to succeed.