“Considering its scale, intrusiveness, and unintended side effects, the privacy problem we investigate deserves more attention from browser vendors, privacy tool developers, and data protection agencies,” warned the authors of the study.
Many users were unwittingly led into thinking that their personal data was safe while filling out their email address on a website, registering an account, buying a ticket, or subscribing to a newsletter.
The research groups created fake software profiles imitating a live user that visited thousands of websites and then filled in login or registration information without clicking the submit button.
They found that 1,844 websites in the European Union had gathered individual email addresses without user consent, while 2,950 U.S.-based sites did exactly the same.
“It certainly exceeded our expectations by a lot,” says Güneş Acar, a professor and researcher at Radboud University, who explained that his team initially thought they would find just a couple hundred sites taking user data.
“Based on our findings, users should assume that the personal information they enter into web forms may be collected by trackers—even if the form is never submitted,” said the study’s authors.
The results found that in some cases, websites collected the data themselves in-house before submission, but most of the data gathered was solely collected by third-party advertising and marketing services like Taboola, Bizible, and Glassbox digital, which were built into websites to monetize content.
The algorithm used by the third parties to collect data was very similar to that of “keylogging,” a technique malware programs utilize to record a user’s keystrokes, often to steal passwords and other confidential information, but rarely the collection of email addresses.
In addition, the researchers “found incidental password collection on 52 websites by third-party session replay scripts,” which were also collecting password data before submission.
Since then, the study group informed the various sites’ operators that the issues in collecting the passwords had been resolved.
In a follow-up investigation, they found that Meta and TikTok had used in-house invisible marketing trackers to collect personal information from web forms without consent.
Websites that used Meta’s Pixel or TikTok’s Pixel software, which allow a webpage’s domains to track visitor activity, would trigger an “automatic advanced matching” feature to allow the two social media giants to grab data from an advertiser’s site.
Every email or piece of data partially entered into a website using Pixel software, even after a click to another page, would result in personal information being taken by Meta or TikTok.
“Documentation we looked together with Asuman claims that [Meta] only collect this data when users click Submit, but we’ve looked into their code and they were collecting all clicks to any button, any link on the page,” said Acar.
The professor found that 8,438 U.S. sites may have been leaking data to Meta through Pixel, while 7,379 sites were compromised for EU users.