Podcast: Delivering Exascale Machine Learning Algorithms at the ExaLearn Project

Print Friendly, PDF & Email

In this Let’s Talk Exascale podcast, researchers from the ECP describe progress at the ExaLearn project. ExaLearn is focused on ML and related activities to inform the requirements for these pending exascale machines.

Aurora and Frontier, set to be America’s first exascale supercomputers in 2021, are expected to hasten the convergence of traditional HPC, data analytics, and ML. In 2023 the El Capitan exascale machine is to be deployed as well.

While Big Tech such as Google and Amazon use trained ML models to predict what users and customers may want based on previous behavior, ExaLearn’s focus is entirely different, Alexander said. He stressed that ExaLearn is providing exascale ML software for scientific research. ExaLearn’s algorithms and tools will be used by the ECP applications, other ECP co-design centers, and DOE experimental facilities and leadership-class computing facilities.

As one of ExaLearn’s early successes, Nugent highlighted using ML on cosmology datasets to generate surrogate models to make complicated simulations less expensive. Nugent’s team at LBNL set up a project and established a method for creating the surrogate models that Van Essen’s team at LLNL implemented using deep neural networks and the Sierra and Summit resources.

The hypothesis was that if we could teach neural networks to see all of the data, they could fundamentally get a better understanding of the underlying science and predict better results in the training,” Van Essen said. “So, we as part of the ExaLearn multi-lab collaboration, have been able to work on scaling up and training these deep neural networks to datasets that have been previously unattainable throughout the community. We’ve delivered new results on the CosmoFlow network at Berkeley Lab. Fundamentally, this is a great scientific achievement that we can give back to other ECP projects like ExaSky, but it also provides a foundational capability to the ExaLearn project and to DOE and ECP at large.”

Van Essen said that ExaLearn has achieved two major technical advances its first year: it has developed a unique ability to train surrogate models and has impacted scientific research through the creation of models that are an order of magnitude more capable than previous ones.

The ExaLearn team is developing four major types of ML algorithms, Alexander said: surrogates, control, inverse problem solvers, and design. By the end of the project, ExaLearn hopes to have used all of the four in a given domain area, he added.

A key legacy of the ExaLearn project, Van Essen said, will be to use the strength of the eight-lab collaboration to find technologies across the science, ML, and HPC space and combine them for the benefit of the DOE national laboratories and the scientific community at large. And, Nugent pointed out, ExaLearn is making all of its datasets publicly available.

Source: Scott Gibson at the Exascale Computing Project

Download the MP3

Sign up for our insideHPC Newsletter