Investigating new threats – and benefits – for online privacy

The battle over online data privacy is constantly in flux, with the boundaries endlessly redrawn by policy, technology and public sentiment. With two new grants from the National Science Foundation, UIC computer scientist Chris Kanich will explore two flashpoints in this conflict: the risks of new tactics in internet ad targeting and the usefulness of privacy “nutrition labels.”

The first project, a $1.2 million grant with collaborators Paul Pearce of Georgia Tech and Damon McCoy of New York University, will tackle the emerging dangers in online advertising as third-party cookies are phased out. Cookies are the small pieces of code often installed in your browser when you visit a website, allowing those sites and their advertisers to see some of your online history and tailor the content it shows you.

Until recently, third-party cookies were the preferred method for advertisers to track users across the internet; if you previously shopped for shoes on one site, they serve ads for footwear on another. But legislation in California and Europe have required them to be transparent — mandating the pop-ups that require user consent to accept cookies — and many popular browsers have even blocked these tracking cookies altogether.

As a result, advertisers are developing new methods based on machine learning that attempt to infer who is looking at a particular website. By taking information it can see, such as your IP address or information entered into a form, and combining it with data it has purchased or collected from other sites, the provider can guess who you are and what you want to see. While this targeting method is more complicated and less accurate than tracking via cookies, it opens new doors for misuse.

“For these third-party advertisers, it is going to be much more difficult, or in some cases impossible, to have a strong idea of identity,” Kanich said. “So that will mean that it’s more likely that your ads can be entangled with someone else, either accidentally or on purpose by an attacker.”

Entanglement occurs when the provider guesses wrong and serves you the ads that someone else would want to see. It also could be triggered if a user enters somebody else’s information into a website from any computer or device; with cookies, you would have to use the same device and browser as the target to see their ads. This method could be used to access sensitive information through ads, such as travel plans or searches for products related to pregnancy.

“That’s both creepy and dangerous, because an attacker can get all the ads that you’re supposed to get, and that can be very personal information,” Kanich said. “If we think about cyberstalking or intimate partner violence situations, all you need to know is someone’s email address in order to start receiving the ads that would be sent to them.”

Unlike cookies, which are visible to sophisticated computer users, other new tracking methods happen on private servers and away from public scrutiny. Kanich and his collaborators will develop new tools to detect these practices by setting up test accounts that mimic internet browsing, seeing what advertisements are served to those accounts based on their activity, and reverse engineering the information that is collected and the algorithms used to select ads.

“We want to give individuals and policy makers a better understanding of all the stuff that’s going on behind the scenes, so that we’ve given people the tools they need to make an informed decision,” Kanich said.

In the second project, Kanich will work with Serge Egelman of the International Computer Science Institute and Adam Aviv of George Washington University to study new, user-friendly privacy labels instituted by app stores in recent years.

First proposed in 2009, these disclosures were modeled after the nutrition labels on groceries as a way to make the lengthy legalese of privacy policies more accessible to the general public. Formatted like simple checklists, the labels disclose what each app does, the data that it collects, and how that information is used after it is collected.

Apple started requiring these simpler labels in its App Store in late 2020, and Google followed suit in the Google Play Store in 2022. Almost immediately, Kanich and his colleagues began to collect data on how the labels were used, downloading every privacy label and privacy policy from Apple’s App Store once a week for two years.

The team will use that dataset as well as interviews with app users, developers and app store administrators to assess the accuracy and effectiveness of the labels.

“We need to understand how they are actually used by developers, how they are actually interpreted by users and how they relate to people’s decisions,” Kanich said.

Together, the two projects reflect Kanich’s research mission: using the tools of computer science to bring more transparency to the flow of data online, and helping internet users make smart and safe choices.

“I really like informing everyday people about all this crazy stuff that’s going on, because I do think with better information, people will be able to make better decisions both individually and collectively,” Kanich said. “A lot of this stuff is not well understood, or it’s actively hidden from us. I want us to make those decisions with our eyes open and with the best possible information.”

This story first appeared on UIC Today July 6, 2023.