The Human Genome Project was one of the earliest and most important high-performance computing projects that had a massive impact on the field of computer science as well as biology and medicine. The goal of the project was to sequence the entire human genome and identify all the approximately 20,000-25,000 genes in human DNA. This required analyzing the 3 billion base pairs that make up human DNA. Sequence data was generated at multiple laboratories and bioinformatics centers worldwide, which produced enormous amounts of data that needed to be stored, analyzed and compared using supercomputers. It would have been impossible to accomplish this monumental task without the use of high-performance computing systems that could process petabytes of data in parallel. The Human Genome Project spanned over a decade from 1990-2003 and its success demonstrated the power of HPC in solving complex biological problems at an unprecedented scale.
The Distributed Fast Multipole Method (DFMM) is an HPC algorithm that is very widely used for the fast evaluation of potentials in large particle systems. It has applications in the fields of computational physics and engineering for simulations involving electromagnetic, gravitational or fluid interactions between particles. The key idea behind the DFMM algorithm is that it can simulate interactions between particles with good accuracy while greatly reducing the calculation time from O(N^2) to O(N) using a particle clustering and multipole expansion approach. This makes it perfect for very large particle systems that can number in the billions. Several HPC projects have focused on implementing efficient parallel versions of the DFMM algorithm and applying it to cutting edge simulations. For example, researchers at ORNL implemented a massively parallel DFMM code that has been used on their supercomputers to simulate astrophysical problems with up to a trillion particles.
Molecular dynamics simulations are another area that has greatly benefited from advances in high-performance computing. They can model atomic interactions in large biomolecular and material systems over nanosecond to microsecond timescales. This provides a way to study complex dynamic processes like protein folding at an atomistic level. Examples of landmark HPC projects involving molecular dynamics include simulating the folding of complete HIV viral capsids and studying the assembly of microtubules with hundreds of millions of atoms on supercomputers. Recent HPC projects by groups like Folding@Home also use distributed computing approaches to crowdsource massive molecular simulations and contribute to research on diseases. The high fidelity models enabled by ever increasing computation power are providing new biological insights that would otherwise not be possible through experimental means alone.
HPC has also transformed various fields within computer science itself through major simulation and modeling initiatives. Notable examples include simulating the behavior of parallel and distributed systems, development of new parallel algorithms, design and optimization of chip architectures, optimizing compilers for supercomputers and studying quantum computing architectures. For instance, major hardware vendors routinely simulate future processors containing billions of transistors before physically fabrication them to save development time and costs. Similarly, studying algorithms for exascale architectures requires first prototyping them on petascale machines through simulation. HPC is thus an enabler for exploring new computational frontiers through in silico experimentation even before the actual implementations are realized.
Some other critical high-performance computing application areas in computer science research that leverage massive computational resources include:
Big data analytics: Projects involving analyzing massive datasets from genomics, web search, social networks etc. on HPC clusters and using techniques like MapReduce. Examples include analyzing NASA’s satellite data or commercial applications by companies like Facebook, Google.
Artificial intelligence: Training very large deep neural networks on datasets containing millions or billions of images/records requires HPC resources with GPUs. Self-driving car simulations, protein structure predictions using deep learning are examples.
Cosmology simulations: Modeling the evolution of the universe and formation of galaxies using computational cosmology on some of the largest supercomputers. Insights into dark matter distribution, properties of the early universe.
Climate modeling: Running global climate models with unprecedented resolution to study changes, make predictions. Projects like CMIP, analyzing petascale climate data.
Cybersecurity: Simulating network traffic, studying botnet behavior, malware analysis, encrypted traffic analysis require high performance systems.
High-performance computing has been instrumental in solving some of the biggest challenges in computer science as well as enabling discovery across a wide breadth of scientific domains by providing massively parallel computational capabilities that were previously unimaginable. It will continue powering innovations in exascale simulations, artificial intelligence, and many emerging areas in the foreseeable future.