Analyzing climate change data to determine long term trends:
A student could analyze decades worth of existing temperature and climate data collected from various sources like NASA, NOAA, and others. The student would look for trends in rising global temperatures, changes in weather patterns, frequency of extreme weather events, rising sea levels etc. over the years. They would perform statistical analysis on the data to see how the trends have changed over decades and what conclusions can be drawn about human-caused climate change and its impacts. The extensive existing data allows complex analysis to be done to better understand historical climate trends and changes.
Analyzing biomedical data from gene expression studies:
Many universities and research labs have published gene expression datasets from various disease and healthy tissue samples. A student could analyze one such publicly available dataset to address a specific biomedical question. For example, they could analyze gene expression patterns in healthy vs cancerous tumor tissue samples to identify key genes and pathways that are upregulated or downregulated in cancer. Statistical analysis would help find correlations and draw biological conclusions. This leverages existing molecular data to advance our understanding of disease mechanisms without needing to generate new experimental data.
Analyzing satellite remote sensing data to monitor land use changes:
Various government and non-profit organizations have open satellite remote sensing datasets spanning decades. A student could analyze landscape images from different time periods to map and quantify land use and land cover changes over years. For e.g. analyzing forest cover loss trends in a particular geographical region, or mapping urban expansion patterns near a city. Image processing and GIS software can be used to analyze multi-temporal remote sensing images, quantify changes and understand drivers of land transformation. This allows large scale spatial and temporal analysis of environmental changes at low cost.
Analyzing drug trial data to understand efficacy and adverse effects:
Clinical drug trial datasets with results are often publicly shared post-publication. A student could analyze results from multiple clinical trials of a certain drug class (e.g. statins, SSRIs) pooled together. Statistical techniques help uncover drug efficacy trends overall and for specific patient subgroups. They could also analyze adverse event reports to understand impact of covariates like age, gender etc on safety. This leverages extensive pre-existing trial data to advance understanding of treatment outcomes at a broader population level.
Analyzing genomics datasets to study evolutionary relationships:
Public genomics databases contain whole genome sequences of diverse species that allow phylogenetic questions to be studied. A student can analyze genomic DNA sequences of model organisms and their close relatives to reconstruct evolutionary history, identify orthologous genes, study sequence homology and divergence rates. Sequence alignment and tree-building tools help analyze evolutionary patterns and relationships. This leverages availability of large pre-existing genomic datasets without needing to generate new sequence data.
In all the above examples, students analyze extensive pre-existing scientific datasets (often publicly available) spanning long periods of time or large number of samples, to address specific questions utilizing appropriate statistical and computational analytical tools. This allows leveraging wealth of existing data rather than needing to generate new primary data, within constraints of a capstone project’s scope and timeline. The analyses help advance current scientific understanding of topics like climate impacts, disease mechanisms, environmental changes, drug efficacy and evolution – all by tapped the tremendous volume of accumulated observational and experimental data in various domains.
Analysis of extensive pre-existing scientific datasets spanning long time periods or large sample sizes is an excellent option for many science capstone projects. It leverages readily available published data rather than requiring new primary data generation. Complex questions related to trends, correlations, subgroup differences etc. can be addressed with appropriate statistical and computational analyses. This approach allows deeper investigation of important topics within project constraints, while meaningfully contributing to knowledge in the domain through synthesis and interpretation of accumulated past data.