New faculty member brings high-performance computing expertise

Zhiling Lan

Professor Zhiling Lan joined the computer science department this fall. Lan, who is also a guest research faculty member at Argonne National Laboratory, focuses on parallel and distributed systems and high-performance computing.

Prior to joining UIC’s faculty, Lan spent over 20 years at the Illinois Institute of Technology, where she led the Systems for Performance, Energy, and Resiliency Team. She received her doctoral degree from Northwestern University in computer engineering. She is looking forward to increased opportunities for collaboration at UIC.

“I’m very interested in building up the high-performance computing program with other faculty in the department,” Lan said.

Lan’s research vision is to unify systems and artificial intelligence (AI), by creating software systems that expedite AI applications. She categorizes her work into four areas, including fault tolerance, power management, resource management scheduling, and modeling and simulation.

Lan works on fault tolerance. She explained that computer component failures are common, and in a large-scale system, the failure rate increases exponentially. To rectify this, she uses AI-driven log analysis to discover if there is the potential for a failure to occur soon, and if one occurs, how to quickly recover the system so it can be of service to the user.

Another aspect of her work is power management – supercomputers consume a lot of electricity. Lan says future supercomputers, such as the exascale supercomputer Aurora being installed at Argonne National Laboratory will consume 50 to 60 megawatts of power, costing over $30 million per year to operate.

“We are trying to come up with a solution so that the machine can use less power, or operate under some power cap,” Lan said.

Lan is collaborating with UIC Professor Michael Papka, who also serves as the deputy associate laboratory director of Computing, Environment, and Life Sciences at Argonne National Laboratory.  The duo is working on resource management scheduling, a key software that decides the order for which job to execute when a user submits a request to a system, allowing the system to be fully utilized.

Her last area of focus is modeling and simulation. She runs simulations using digital twins, or virtual representations of a system, to help with designing both physical computing systems and software systems to distribute workflow as efficiently and reliably as possible.

“In the long term, my research vision is to unify systems and AI, where I am committed to creating software systems to expedite AI applications, and exploring advanced AI technologies to tackle critical research challenges in computer systems,” Lan said. “Moreover, I am dedicated to broadening the scope of my research from computer clusters to computing continuum spanning from edge devices to large-scale computing facilities.”