Lan hopes to help researchers unlock the mysteries of neutrinos

Unprecedented volume of data collection requires distributed, resilient workflow process

Zhiling Lan

Professor Zhiling Lan received a $1.5 million grant from the U.S. Department of Energy to model a distributed, resilient workflow process for the department’s High Energy Physics program in neutrino and collider science.

The Deep Underground Neutrino Experiment (DUNE) aims to unlock the mysteries of neutrinos, creating a clearer picture of the universe and how it works. DUNE comprises two particle detectors, one at Fermi National Laboratory in Batavia, Illinois, and one under construction in the Long-Baseline Neutrino Facility in South Dakota.

Massive amounts of data will be transferred 800 miles from the experiment site in South Dakota to the Fermilab storage system, then transferred to Argonne National Laboratory in Lemont, Illinois, for processing by the lab’s Aurora Supercomputer.

DUNE is expected to generate as much as 6 GB of data every five milliseconds. This translates to a data volume nearly 300 times greater than the equivalent interactions being captured with current technologies.

Lan will work on creating a digital twin of the physical environment of the system, to model and simulate this distributed workflow process between data collection and data analyzing, and provide feedback to shape the experiment.

“We need to process the data as soon as possible, because we want to do real-time or near-real-time analysis,” Lan said. “Based on the result, we can provide feedback on the operational states of the neutrino detectors and beam facilities and say ‘next time, you need to do this type of data analysis campaign’.”

Lan explained that their modeling will consider storage system failures and network failures, such as data not being shipped to the supercomputer, or computer components failing.

“We can make sure we can anticipate these things, then provide remedies, or fault tolerances to prevent or mitigate the impact,” Lan said.

She also needs to ensure that the other crucial science applications that run on Aurora can be carried out alongside the DUNE data processing. Lan will devise a scheduling policy to make sure she can satisfy these high energy physics simulations, without impacting this campaign or any other science user.

“This intelligent multi scale parallel simulation framework, called Tachyon, will enable end-to-end modeling of highly complex, distributed scientific workflow with a real-time component,” Lan said.

The project, Tachyon: Intelligent Multi-Scale Modeling of Distributed Resilient Infrastructure and Workflows for Data Intensive HEP Analyses (DOE), runs through August 2028, and involves five institutions:  Rensselaer Polytechnic Institute, Argonne National Laboratory, Fermi National Accelerator Laboratory, University of Illinois Chicago, and University of California at Davis. Each institution received an equal share of funding, with a total budget of $7.5 million.