FLoC: Google’s new means of following you across the web
If you browse the web without an ad blocker, you will have noticed that the ads you see tend to follow you around. Have you been looking around for a new pair of shoes? Be prepared to see more ads for shoes on completely unrelated websites. This advertising technique is called "behavioural retargeting" and is built on recording your web history in a central place, then using that information to select ads that advertisers expect you are more likely to react to. In this article. EDRi's member epicenter.works sheds some light on Google's new way of tracking users across the web.
If you browse the web without an ad blocker, you will have noticed that the ads you see tend to follow you around. Have you been looking around for a new pair of shoes? Be prepared to see more ads for shoes on completely unrelated websites. This advertising technique is called “behavioural retargeting” and is built on recording your web history in a central place, then using that information to select ads that advertisers expect you are more likely to react to.
Following you across the web
But how do completely unrelated websites know where you’ve previously been? The reason you can be recognised is that most of the time, ads aren’t served by the websites you visit, they’re merely embedded into the website and are actually loaded from a different place. That different place is an ad network that has many websites as customers, and it can record where and when ads were loaded into your browser.
However, the ad network still has to solve one problem: out of all the requests for ads it receives from people visiting various websites, how can it know which of these are you? You may have heard that on the internet your device is identified by a number called its IP address. However, for advertisers, IP addresses are not good enough: whenever you change your internet connection, for example because your phone switches from your wifi onto the mobile operator’s network as you leave your home, then hops onto the wifi at the café you’re visiting, your IP address changes as well. What advertisers want is to be able to recognise you along that entire journey.
Another technique advertising networks therefore use is cookies. Cookies are small pieces of information that a website can place in your browser and that are transmitted to the website whenever you load a page on that website. Often, cookies are used for purposes such as remembering that you have already logged in to a service. Ad networks however use them to place a unique number in your device which is transmitted to the ad network whenever an ad is loaded and which allows the network to recognise you across websites you visit.
Third-party cookies are dying
Unlike cookies keeping track of your login, however, cookies placed by ad networks are distinct in an important way: because ads are loaded from a different origin, your browser can recognise that these aren’t cookies placed by the website you’re visiting. Instead, they’re third-party cookies, placed by the ad network itself.
Because third-party cookies are so prominently used to follow you across the web, browsers have started to block them. For example, in Firefox, this feature is called Enhanced Tracking Protection and is activated by default. Similarly, Safari has started blocking third-party cookies. The online advertising industry has therefore been lamenting the death of third-party cookies, and is working on ways to replace them.
Federated Learning of Cohorts
Google’s main source of revenue is online advertising: its ad network “Google Ads” generates annual revenues of more than $100 billion. But Google also has a different asset: its web browser. Google Chrome is the most popular web browser by some margin almost everywhere in the world. So Google’s idea is: instead of ad networks inferring your interests, your browser should inspect your browsing history and assign you an interest group, called a cohort, and make this information available to websites and ad networks you connect to. They call the process they have devised for finding such an interest group “Federated Learning of Cohorts” or “FLoC”, and epicenter.works explains how it works in the following video1:
In a whitepaper, Google claims that this method is privacy-preserving: because cohorts are chosen so that there is always at least a certain number of users that belong to the same cohort, a certain degree of indistinguishability from other users is preserved (this property is called “k-anonymity”). But do cohorts really leave you with more privacy?
Less private, not more
There are three major problems:
Tracking around the web will still be valuable. Because the method employed by Google categorises users based on visited domain names and does so based on random divisions of the space of all possible browser histories, ad networks must combine the cohorts they are told users belong to with other data they have on these users in order to gradually learn what kind of interests particular cohorts stand for. They will use their existing tracking data in order to infer these interests, and finding means to track you around the web without cookies will remain valuable just to derive better information about what particular cohorts mean.
Cohorts can be more privacy-invasive than tracking based on cookies. Ad networks can only see that you visit a certain website if their ads are actually embedded on that website. But your web browser can calculate your cohort based on your entire browsing history. This means that telling an ad network your cohort can reveal information about your interests that the network would not otherwise have known. Additionally, certain cohorts could consist primarily of users that have visited a domain name of a website revealing a particularly sensitive interest (for example a website about a particular illness). Google acknowledges this problem in a paper but also admits that in order to solve it (by blocking your browser from reporting potentially sensitive cohorts), they must make a trade-off between your privacy and the utility of their system.
Cohorts can be used to refine already existing tracking techniques. Unfortunately, cookies are not the only way to identify users on the web. Websites and ad networks can also request and combine information such as your screen size and which fonts you have installed, creating a unique “fingerprint” of your browser. The cohort your browser reports provides additional information for fingerprinters, making you more distinguishable, not less.
In summary, Google’s new mechanism may well make you less private, not more private as you browse the web. Instead, it merely repurposes your browser into an additional tool to learn what you like. Instead of the browser working solely in your interest, it will work in the interest of advertisers.
(Contribution by: Benedikt Gollatz, EDRi member, Epicenter.works)
1 For the more mathematically inclined, the method used to find similar browsing histories is repeated calculation of the similarity measure described in section 3 of Moses S. Charikar: Similarity estimation techniques from rounding algorithms (2002). The vector u consists of a “one-hot” encoding of hashes of visited domain names (it is the characteristic vector of the subset of visited domain name hashes in the set of all domain name hashes). The components of the hyperplane normals r are derived from these hashes and the number of the plane, which can be done decentrally in a deterministic manner. The implementation can be found in the chromium source code repository.