2021 off to a busy start for Assistant Professor Abolfazl Asudeh
Abolfazl Asudeh has had a very productive 2021. The assistant professor, who focuses on aspects of big data and data science, was the invited author for January of the Association for Computing Machinery’s Special Interest Group on Management of Data (ACM SIGMOD) blog. He had a research highlight in the February issue of Communications of the Association for Computing Machinery, and he received a $60,000 Google Research Scholar Award.
The blog post appeared in the very selective, high-profile blog for the data management community, which showcases only a few articles per year on data technology trends by prominent researchers. Responsible data science and algorithmic fairness is Asudeh’s main research focus and the subject of the post, Enabling Responsible Data Science in Practice.
According to Asudeh, data—especially social data—is almost always biased, as it inherently reflects historical perceptions and stereotypes. Data collection and representation methods often introduce additional bias. The effects can be seen in models that are used to predict how likely a defendant is to commit another crime in the future and which consumers will be shown advertisements for various types of housing or jobs, and in models that are used to identify if a person’s eyes are open or closed in photographs.
Asudeh said the first hurdle in developing better models is in defining fairness.
“A major challenge is the gap between social science and data science,” he said. “We as data scientists are good at mathematics and algorithms. But if we say, ‘define fairness, formally,’ every social scientist has their own perspective, and they are contradictory. So, it is challenging to mathematically define fairness.”
Laws and regulations governing technology have not kept up, and using biased data without paying attention to societal impacts can create a feedback loop and even amplify discrimination.
Asudeh says enabling responsible and ethical practices of data science requires a pipeline of user-friendly tools for data preparation, algorithm design, and generating fitness-for-use signals that would provide immediate feedback about the applicability of a dataset, an algorithm, or a machine learning model being used for the particular task. New, fairer models require human oversight.
“We want to be sure machine learning models are fair and not hurting persons of color, minority groups, or any marginalized people,” Asudeh said.
Asudeh’s Google Research Scholar Award, End-to-end detection of cherry-picked trendlines, begins with the Alfred, Lord Tennyson quote, “A lie which is half a truth is ever the blackest of lies.”
The work focuses on the bane of every reporter’s job; partial truths. Outlets such as Politifact, FactCheck.org and others readily identify blatant lies by politicians and others, but more difficult to detect—especially quickly—is a statement that contains a kernel of truth, or a true statement stripped of all context.
Asudeh provided the following statement as an example of cherrypicking: “ the summer of 2012 was colder than the winter of that year.” This was largely false—only 1% of available evidence supports the notion that the summer was in fact colder than the winter that year, so while the statement is indeed based on a portion of the actual data, it certainly isn’t accurate.
“We design efficient algorithms to quickly detect cherry-picking and to mine data in order to find the most accurate statement supported by data,” Asudeh said. “But detecting these half-lies and quickly finding the proof is significantly more challenging than regular fact-checking.”
Asudeh hopes to develop tools that will assist in more quickly putting these partial-truth statements in context, by sweeping data related to the claims.
The Research Highlight in Communications of the ACM, includes Asudeh’s article on Scalable Signal Reconstruction for a Broad Range of Applications, together with a technical review by Professor Zachary G. Ives.
Asudeh worked with AT&T Labs – Research on the signal reconstruction problem with the application in computer networks.
Signal reconstruction problem is an important optimization problem where the objective is to identify a solution to an underdetermined system of linear equations that is closest to a given prior point. It has a substantial number of applications in diverse areas, including network traffic engineering, medical image reconstruction, acoustics, astronomy, and more.
Unfortunately, most of the common approaches for solving the signal reconstruction problem do not scale to large problem sizes.
Asudeh proposes a novel and scalable algorithm for solving this critical problem. He offers multiple optimization steps, enabling scaling to climb from settings of size in the order of a hundred by a thousand, all the way up to a million by a billion on a single machine.
Asudeh directs the Innovative Data Exploration Laboratory (InDeX Lab) at UIC, an academic research group. He credits the great work of the students in his lab in his research and is currently accepting new students.