Boris Glavic joins department in 2024

Boris Slavic

Boris Glavic will join the computer science department January 1, 2024, as an associate professor. His research interests lie in database systems, including data provenance, data integration and curation, uncertain data, and query execution and optimization.

Glavic spent over a decade at the Illinois Institute of Technology (IIT), where he led the IIT Database Group. He supervised six doctoral students, who worked on a variety of projects to develop solutions to emerging challenges in database systems.

Prior to his time at IIT, Glavic spent two years as a postdoctoral researcher at the University of Toronto, Canada. He received his doctoral degree in computer science from the University of Zurich in Switzerland, and his undergraduate degree from RWTH Aachen University in Germany.

One of Glavic’s research interests is managing uncertainty in data. Uncertainty arises naturally in many application domains due to measurement errors, human error in data entry or transformation, missing data and bias in data collection, or other reasons. When uncertainty is ignored during data preprocessing and analysis, this leads to hard-to-trace errors that can have severe real-world implications–such as wrongful convictions and incarcerations, and medical misdiagnosis.

While general models exist to represent and clean these probabilistic or incomplete data sets, traditional techniques are too expensive to apply to larger data sets. Glavic has investigated approximation techniques that hit the sweet spot between performance and accuracy of the representation. These techniques have enabled the evaluation of complex computations over uncertain data, unsupported by previous techniques.

“You can help every data scientist to be more responsible in what they do because now they can actually determine whether to trust their analysis results or not,” Glavic said.

Another area of Glavic’s research is improving the reproducibility of computational notebooks. These notebooks, such as Jupyter notebooks, are used in data science to interactively build data preparation and analysis pipelines by allowing a user to type some code into a cell and directly observe the result of executing these instructions. From these smaller pieces of code, larger computational pipelines can be built.“Over time by creating and running these cells, and adding to and updating the code of cells, you can end up in states that are not reproducible; rerunning a notebook does not necessarily produce the same results that the user who developed the notebook observed,” Glavic said. “This is due to the nature of common notebook systems that do not automatically update derived data.”

Together with colleagues from the University of New York at Buffalo and New York University, Glavic has developed a reproducible notebook system called Vizier that parallelizes notebook execution and tracks uncertainty for all notebook operations. He and his colleagues founded a startup, Breadcrumb Analytics, that uses this system to help other companies solve data issues, keeping past versions of the notebooks to for reproducibility and for supporting iterative development of notebooks.

Glavic is well known for his contributions to data provenance. Data provenance is information about the curation process and origin of data. Provenance can help analysts to debug their queries, to enable companies to fulfill auditing requirements, among other use cases. Over the course of his career, Glavic has built several systems that enable automatic tracking and managing of provenance information. He has a long-running collaboration with software giant Oracle, working on provenance-related topics including using provenance to measure the value of data and for improving the performance of database systems.

Glavic is also involved with another startup, Ocient, as a technical advisory board member. The company creates hyperscale data solutions that can analyze trillions of data records in interactive time.

Glavic is looking forward to joining UIC, citing the colleagues he will join in the data science and databases research areas. He also appreciates the diversity of the student body at the university and UIC’s commitment to equity. He will teach CS 480, database systems, in the Spring 2024 semester.