Inside the Gordon Bell Prize Finalist Projects – HPCwire

Posted: September 11, 2022 at 1:29 pm

The ACM Gordon Bell Prize, which comes with a $10,000 award courtesy of HPC luminary Gordon Bell, is widely considered the highest prize in high-performance computing. Each year, six finalists are selected who represent the pinnacle of outstanding research achievements in HPC. Last month, listings on the SC22 schedule revealed those finalists. Over the last few weeks, HPCwire got in touch with members of the six finalist teams to learn more about their projects.

Last year, for the first time, the Gordon Bell Prize nominees included two projects powered by exascale computing specifically, Chinas new Sunway supercomputer, also known as OceanLight. These research papers, at the time, constituted the most substantively official reveal of the system (which remains unranked). One of those OceanLight-powered papers a challenge to Googles quantum supremacy claim won that years Gordon Bell Prize.

In 2022, OceanLight has exascale-caliber competition: not one but two of the other five finalist projects used the new American exascale supercomputer, Frontier, which launched earlier this year at Oak Ridge National Lab (ORNL). And, beyond OceanLight and Frontier, previous Top500-toppers Fugaku (RIKEN) and Summit (ORNL) both return to the list under multiple finalist teams, along with Perlmutter (at NERSC, the National Energy Research Scientific Computing Center) and Shaheen-2 (at KAUST, the King Abdullah University of Science and Technology).

And now: the finalist projects.

This year sees OceanLight return to the stage as the sole supercomputer behind a paper titled 2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT a project involving simulations of millions of atoms that made use of tens of millions of cores on OceanLight.

Abstract: Over the past three decades, ab initio electronic structure calculations of large, complex and metallic systems are limited to tens of thousands of atoms in both numerical accuracy and computational efficiency on leadership supercomputers. We present a massively parallel discontinuous Galerkin density functional theory (DGDFT) implementation, which adopts adaptive local basis functions to discretize the Kohn-Sham equation, resulting in a block-sparse Hamiltonian matrix. A highly efficient pole expansion and selected inversion (PEXSI) sparse direct solver is implemented in DGDFT to achieve O(^1.5) scaling for quasi two-dimensional systems. DGDFT allows us to compute the electronic structures of complex metallic heterostructures with 2.5 million atoms (17.2 million electrons) using 35.9 million cores on the new Sunway supercomputer. In particular, the peak performance of PEXSI can achieve 64 PFLOPS (5 percent of theoretical peak), which is unprecedented for sparse direct solvers. This accomplishment paves the way for quantum mechanical simulations into mesoscopic scale for designing next-generation energy materials and electronic devices.

Per the SC22 schedule, this team includes researchers from the Chinese Academy of Sciences, Peking University, the Pilot National Laboratory for Marine Science and Technology, the National Research Center of Parallel Computer Engineering and Technology, the Qilo University of Technology and the University of Science and Technology of China.

Our team is highly excited [to be] nominated for the Gordon Bell Prize finalists as we started preparation for this work since last year, said Qingcai Jiang, a researcher at the University of Science and Technology of China (USTC), in an email to HPCwire. Our work for the first time achieves plane-wave precision electronic structure calculation for large-scale complex metallic heterostructures containing 2.5 million atoms (17.2 million electrons), and our optimization techniques make our work able to achieve peak performance of 64 PFLOPS (5 percent of theoretical peak), which is unprecedented for sparse direct solvers.

The first of projects powered by Frontier, titled ExaFlops Biomedical Knowledge Graph Analytics, also made use of ORNLs previous chart-topper, Summit, and focuses on large-scale mining of biomedical research literature.

Abstract: We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover relationships among concepts. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as an all-pairs shortest paths (APSP) and validate connective paths against curated biomedical knowledge graphs (e.g., SPOKE). In this context, we present COAST (Exascale Communication-Optimized All-Pairs Shortest Path) and demonstrate 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic performance models (HYPERMOD), which guide optimizations and parametric tuning. The proposed COAST algorithm achieved the memory constant parallel efficiency of 99 percent in the single-precision tropical semiring. Looking forward, COAST will enable the integration of scholarly corpora like PubMed into the SPOKE biomedical knowledge graph.

Per the SC22 schedule, this team includes researchers from AMD, the Georgia Institute of Technology, ORNL and the University of California, San Francisco.

The ability to establish paths between any pair of biomedical concepts with the richness of PubMed in a reasonable time has the potential to revolutionize biomedical research and apply national research funds more effectively, said Ramakrishnan Kannan, group leader for discrete algorithms at ORNL, in an email to HPCwire. The comparison of knowledge encoded within SPOKE, which is largely human-curated, against concept relationships that might be mined automatically from a scholarly database like PubMed will result in faster and automated integration of biomedical information at scale.

According to the team, this project is the first exascale graph AI demonstration to run at over one exaflops. This first demonstration of exascale computation speed will transform the way we currently conduct search in complex heterogeneous knowledge graphs like SPOKE, the research team told HPCwire. Specifically, it will enable a new class of algorithms to be implemented in graphs of unprecedented size and complexity. This will greatly improve the quality of biomedical research inquiry, and accelerate the time to patient diagnosis and care like never before.

The second project to use Frontier: Pushing the Frontier in the Design of Laser-Based Electron Accelerators with Groundbreaking Mesh-Refined Particle-In-Cell Simulations on Exascale-Class Supercomputers. Though the title of the paper which revolved around kinetic plasma simulations winks at its use of Frontier, the team actually used four supercomputers: Frontier, Fugaku (RIKEN), Summit and Perlmutter (NERSC), meaning that this one paper used four of the top seven supercomputers on the most recent Top500 list. In an email to HPCwire, Jean-Luc Vay a senior scientist at Lawrence Berkeley National Lab outlined the science runs of the research, which were conducted on Frontier (up to 8,192 nodes), Fugaku (up to ~93,000 nodes) and Summit (up to 4,096 nodes).

Abstract: We present a first-of-kind mesh-refined (MR) massively parallel Particle-In-Cell (PIC) code for kinetic plasma simulations optimized on the Frontier, Fugaku, Summit, and Perlmutter supercomputers. Major innovations, implemented in the WarpX PIC code, include: (i) a three level parallelization strategy that demonstrated performance portability and scaling on millions of A64FX cores and tens of thousands of AMD and Nvidia GPUs (ii) a groundbreaking mesh refinement capability that provides between 1.5x to 4x savings in computing requirements on the science case reported in this paper, (iii) an efficient load balancing strategy between multiple MR levels. The MR PIC code enabled 3D simulations of laser-matter interactions on Frontier, Fugaku, and Summit, which have so far been out of the reach of standard codes. These simulations helped remove a major limitation of compact laser-based electron accelerators, which are promising candidates for next generation high-energy physics experiments and ultra-high dose rate FLASH radiotherapy.

Per the SC22 schedule, this team includes researchers from Arm, Atos, CEA-Universit Paris-Saclay, ENSTA Paris, GENCI, Lawrence Berkeley National Lab and RIKEN.

Plasma accelerator technologies have the potential to provide particle accelerators that are much more compact than existing ones, opening the door to exciting novel applications in science, industry, security and health, Vay explained. Exploiting the most powerful supercomputers in the world to boost the research to make these complex machines a reality is so stimulating to all of us.

It is thrilling for the entire team to be selected as finalist of the Gordon Bell Prize, even for the one of us (Axel Huebl), for whom it is dj vu as he was already a finalist in 2012 with another (PIConGPU) team, Vay added. It is the vindication of years of hard work from the U.S. DOE Exascale Computing Project participants and longstanding collaborators from CEA Saclay in France, coupled to the more recent hard work with colleagues from various labs and private companies in France (Genci, Arm, Atos) and RIKEN in Japan.

The exascale-enabled research only constitutes half the list. Another finalist paper Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications used Shaheen-2 as well as Fugaku.

Abstract: We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical structure of the dense covariance matrix whose inverse action and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the availability of measurements at a large number of locations; however, traditional Cholesky-based approaches grow cubically in complexity, gating practical extension to continental and global datasets now available. We combine the linear algebraic contributions of mixed-precision and low-rank computations within a tilebased Cholesky solver with on-demand casting of precisions and dynamic runtime support from PaRSEC to orchestrate tasks and data movement. Our adaptive approach scales on various systems and leverages the Fujitsu A64FX nodes of Fugaku to achieve upto 12X performance speedup against the highly optimized dense Cholesky implementation.

Per the SC22 schedule, this team includes researchers from KAUST, ORNL and the University of Tennessee. Perhaps notably, the team also includes Jack Dongarra, one of SC22s keynote speakers.

For our exploratory science runs, and to demonstrate the acceptable accuracy of our algorithmic variations on Cholesky factorization and further manipulation of massive covariance matrices, we used Shaheen-2 at KAUST, explained David Keyes, director of the Extreme Computing Research Center at KAUST, in an email to HPCwire. Shaheen-2 has only 6,192 nodes, so we applied to use Fugaku at RIKEN to scale further and were generously considered by RIKEN. Fugaku has 158,976 nodes, about 25 times more than Shaheen-2, and each node has 48 cores, 1.5 times more than a Shaheen-2 node. However, each Fugaku node is equipped with only 32GB of memory, one-quarter as much as Shaheen-2s 128GB per node, thus only one-sixth as much per core, which required us to make software adaptations.

Entering the Gordon Bell competition was exciting for all of the team members, especially the students and postdocs, Keyes said. It provided an opportunity to run on the worlds second ranked computer. The required algorithmic adaptations to architecture led to improvements in our tools that will be useful at all scales. More importantly, the nomination created excitement with the statistics community since 2022 appears to be the first time after 35 years of the prize that any significant spatial statistics computation, environmental or otherwise, has thus advanced.

The final of Fugakus three appearances among the finalist list comes courtesy of Extreme Scale Earthquake Simulation with Uncertainty Quantification, which used the second-ranked system to advance scientific understanding of earthquakes and fields with similar dynamics.

Abstract: We develop a stochastic finite element method with ultra-large degrees of freedom that discretize probabilistic and physical spaces using unstructured second-order tetrahedral elements with double precision using a mixed-precision implicit iterative solver that scales to the full Fugaku system and enables fast Uncertainty Quantification (UQ). The developed solver designed to attain high performance on a variety of CPU/GPU-based supercomputers enabled solving 37 trillion degrees-of-freedom problem with 19.8 percent peak FP64 performance on full Fugaku (89.8 PFLOPS) with 87.7 percent weak scaling efficiency, corresponding to 224-fold speedup over the state of the art solver running on full Summit. This method, which has shown its effectiveness via solving huge (32-trillion degrees-of-freedom) practical problems, is expected to be a breakthrough in damage mitigation, and is expected to facilitate the scientific understanding of earthquake phenomena and have a ripple effect on other fields that similarly require UQ.

Per the SC22 schedule, this team includes researchers from Fujitsu, the Japan Agency for Marine-Earth Science and Technology, RIKEN and the University of Tokyo.

We are very happy to be selected as finalists, wrote Tsuyoshi Ichimura, a professor with the Earthquake Research Institute at the University of Tokyo, in an email to HPCwire. We believe that this has a great impact in showing that capability computing can contribute to an unprecedented Uncertainty Quantification (UQ).

Last, but certainly not least: Extreme-Scale Many-against-Many Protein Similarity Search, which used the Summit supercomputer to perform protein similarity calculations across hundreds of millions of proteins in just a few hours.

Abstract: Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.

Per the SC22 schedule, this team includes researchers from Indiana University, the Institute for Fundamental Biomedical Research, the Department of Energys Joint Genome Institute, Lawrence Berkeley National Lab, Microsoft, NERSC and the University of California, Berkeley.

In an email to HPCwire, the team stressed the importance of this research area to critical fields. Many-against-many sequence search is the backbone of biological sequence analysis used in drug discovery, healthcare, bioenergy, and environmental studies, they wrote. Our work is perhaps the first [Gordon Bell] finalist for a biological sequence analysis problem, which is surprising because sequence analysis is a perfect supercomputing application due to its data and compute intensive nature.

Our pipeline, PASTIS, performs a novel application of sparse matrices to narrow down the search space and to avoid quadratic number of sequence comparisons. Sparse matrix computations are much harder to map efficiently to modern supercomputing hardware, especially to GPU-equipped supercomputers such as the Summit system we have used in this work. Our approach cuts back the turnaround time from days to minutes in discovering similar sequences in huge protein datasets to complete the subsequent analytical steps in bioinformatics and allow for exploratory analysis of data sets under different parameter settings.

Thats all of them. For those keeping score at home: three finalist teams used Fugaku; three used Summit; two used Frontier; and OceanLight, Perlmutter and Shaheen-2 were each used by one finalist team. Were still watching for the reveal of the finalists for the Gordon Bell Special Prize for High Performance Computing-Based Covid-19 Research, which will be awarded for the third time at SC22. At SC22 itself set to be held in Dallas from November 13-18 the finalists for both Gordon Bell Prizes will present their research ahead of the award ceremony.

View original post here:

Inside the Gordon Bell Prize Finalist Projects - HPCwire

Related Posts