An average internet user visits dozens of websites and hundreds of web pages every day, most of which are kept in the history of our internet browsers. But what if someone took this massive database of visited web pages and cross-referenced them? A joint collaboration of Tactical Tech and SHARE Lab researchers focused on discovering intentions, desires, needs, and preferences of a person based on their browsing history.
Swiss journalist, called Mr J for the purposes of the research, visited the Tactical Tech office in Berlin in June 2015, and provided them with a sample of his web history, upon which this research was based. By analysing large sets of web addresses (so-called Uniform Resource Locators URLs), especially from popular services such as Google Maps, Google Search or YouTube, they were able to create a picture of Mr J’s everyday routine, including his interests and intentions, even apartments he rented via Airbnb while he was travelling abroad. Also, since Facebook has a “real-name policy”, it is quite easy to link a person’s web history to their profile, as well as create a social graph of their Facebook friends and connections, based on the Facebook URLs they visited.
As websites Mr J visits contain a lot of trackers, small bits of data used for collecting behavioural information of users, the experiment also showed which companies extract the most data on Mr J. Google, Facebook and Twitter were unsurprisingly among the companies with the largest number of trackers. It was also interesting to “read” sample web pages Mr J visited like a machine would do it. This is possible with Google’s Cloud Natural Language tool, which is attached to its deep learning platform and can be used to extract information about people, places, events, and much more, mentioned in text documents, news articles or blog posts. It recognised important events, names, and places based on keywords it picked up from web pages.
All these findings lead to the conclusion that if someone, such as private companies, the state, or law enforcement, were to employ these techniques on a large segment of the population and target people’s web history, it would be a frightening introduction to a project of “thought police”, arresting individuals suspected of committing a crime in the future.
SHARE Lab: Browsing Histories – Metadata Explorations
(Contribution by Bojan Perkov, EDRi observer SHARE Foundation, Serbia)