ALTwitter: The treasure trove behind 140 characters
One of the main reasons why metadata is used broadly for surveillance and targeted advertisement is its extensive capability to capture more dimensions of useful information than the data itself. An ordinary internet user fails to see the mysterious nature of metadata because it is invisible to the naked eye. Law enforcement agencies and advertisers, on the other hand, use the invisibility of metadata to track, profile and target us.
The ALTwitter platform was initially intended to build Twitter-like profiles of the Members of European Parliament (MEPs) from their Twitter metadata. However, the insights aggregated from the metadata of all the MEPs can be even more interesting than their individual profiles. We can use these insights to explain how the advertising industry benefits from the unregulated metadata, and how the everyday internet users become the product for targeted ads.
The size of metadata is small compared to that of the data from which it originates. But the size doesn’t matter at all. Even though the dataset used for ALTwitter project contains only 617 participants (MEPs) and metadata from a total of approximately 10 000 tweets, this is sufficient to show the power of metadata. Based on the number of tweets, we can conclude, not so surprisingly, that the Member States with more MEPs (such as Germany, France, and Italy) generate the biggest number of tweets. The metadata also shows that the MEPs from Italy tweet the most.
We have the limitation of just 140 characters for a tweet. The content of a tweet represents the data and everything else is the metadata – that includes time of the tweet, sources (hardware and software) used for tweeting, geolocation tags, and much more. These small chunks of information are actually more revealing than the data that we post on Twitter.
To simplify the explanation about the use of metadata, we focus on how people access Twitter. Grouping the users into “web” and “mobile” is one of the primary steps of targeted ads because the advertising strategy is completely different for mobile and web users. Even though 80% of tweets made by the population at large originate from a mobile device, our analysis showed the majority of MEPs’ tweets come from the web and not the mobile service. This shows a big difference between how regular citizens and European Parliamentarians use the service.
It is possible to estimate the financial status of the users based on what mobile devices they use. Since Apple devices are more expensive than Android and other devices, we could expect that people using Apple products are wealthier. In our case, this assumption could be inaccurate, because the MEPs are well paid, and most are paid the same amount. But there are more advanced indicators. Advertisers rely on consumer analysis, especially by conducting sentiment analysis to predict the financial market. This way the ads can be customised to increase the chances of a specific user to click on them. These indicators will be further used for credit scoring – statistical analysis performed by lenders and financial institutions to access a person’s credit worthiness and by that determining the insurance premiums.
People and companies use Twitter as an advertising forum to divert traffic to their websites, and as campaigning tool to reach out to a bigger audience. When installing the Twitter app, users have no choice other than to give Twitter access to information about what apps they have installed on their phones. Twitter then allows advertisers to target the users based on their installed app category, which gives an insight into different types of apps people use, as for example entertainment apps.
Metadata from large databases are often stored in the form of a specific structure so that it can be processed efficiently. It is easy for the advertisers to de-anonymise the metadata by determining its unique characteristics and matching it with other data. By comparing the devices and the number of tweets by an MEP and matching this with other available data, we can identify the MEPs uniquely.
Most of the apps allow the users to share something on Twitter via their in-app sharing features. This is also true for entertainment apps. There are many examples of interesting uniqueness that can be used to profile you and deliver custom-made ads. Here are a few we discovered, which helped us uniquely identify an MEP:
- There is just one MEP who uses the Vine App, a camera app for making 6-second looping videos on an Android device.
- There is just one MEP who uses a BlackBerry Phone, just one who owns a Sony Experia phone, and just one who uses a Samsung Tablet.
- Only one MEP is using a Sports Tracker app.
- Mobile gaming apps are not very popular among the MEPs, except for one who is using Temple Run and one whose metadata reveal the use of Banana Kong.
It is most likely that all the MEPs have one password for their individual official Twitter account, and they share it with their team. They do not use the multi-user login feature of Twitter, but share their passwords. So, it is possible that one of their interns or anyone else with the password used Temple Run or Banana Kong, and not the MEPs themselves – although Angelika Niebler, the Vice-Chair of the European Parliament Committee on Industry, Research and Energy (ITRE) has publicly “outed” herself as being a Banana Kong afficionado, with a personal best of 90 metres.
Last but not least, the advertisers can monetise one’s schedule or calendar by gathering the information from the metadata. As presented on the platform with individual metadata-based profiles of the MEPs, one can learn who is active and at what time, from their Twitter metadata. The advertisers can target the users at the time when they are the most active, to increase the chances of clicking a promotional ad on Twitter.
ALTwitter #hakunametadata: Twitter metadata profiles of the Members of European Parliament
Hakuna Metadata – Let’s have some fun with Sid’s browsing history! (03.05.2017)
Hakuna Metadata – Exploring the browsing history (22.03.2017)
(Contribution by Siddharth Rao, Ford-Mozilla Open Web Fellow, EDRi, and Zarja Protner, EDRi intern)