Increasingly, demands are made that “something be done” about “undesirable” and “harmful” material on the internet: online child abuse images and other criminal pornography, “extremism”, “incitement to violence”, “hate speech”, – and more recently, “fake news”. Organisations representing holders of intellectual property (IP) rights similarly demand that measures be taken to prevent the sharing of IP-protected materials online. There is a widespread assumption that the internet giants (Google, Apple, Facebook, Twitter) have the means and resources to meet these demands, and should be forced to do so.
The means put forward to identify such materials all resolve about “automated” or “algorithmic” filtering: computer programs that are supposed to be able to single out such illegal or otherwise objectionable content from legal, non-objectionable content, while it is being uploaded to the relevant platforms. They work to some extent in relation to already-identified material that has been assessed as illegal, such as previously spotted online child abuse material: The known materials are “hashed” and the hash can be used to note those very same pictures when being uploaded, and they can then be blocked.
However, it is much more difficult to use such tools to try and identify text or images that may constitute objectionable or even illegal content, that is to say where some judgment is required. The same photo may appear on a site promoted by a terrorist group and by a news organisation. In relation to “hate speech”, “incitement to violence” and “fake news”, it is necessary to parse the nuanced meaning of human communication, or to detect the intent or motivation of the speaker, before a judgment can be made.
Even when the standard is clear, the judgment may be difficult. Depictions of people having sex with animals are clearly defined as illegal in UK law. But does this mean all historical or even recent depictions of Leda and the Swan (and other depictions of mythological god/human-beast encounters) must be blocked? (Leda and her swan are currently not blocked from Google searches.)
In relation to IP-protected material, there is the special, major problem of limits to and exceptions from such protection, for example in relation to fair comment or reporting, criticism, parody, caricature and pastiche, or to facilitate access to people with disabilities. The scope and application of those exceptions are difficult to determine in individual cases by lawyers and courts – and well beyond the capabilities of so-called “artificial intelligence” and natural language processing (NLP) tools.
Unfortunately, companies keep trying to sell snake oil tools to governments, such as a tool which, it is claimed, “can detect 94% of Isis propaganda with a 99.99% success rate in tests” – and politicians keep buying such impossible claims.
Now the European Commission is proposing that precisely those tools are to be used by information society service providers to detect copyright-protected materials and “prevent” them from being made available on their sites. Article 13 of the proposed Copyright Directive, effectively requires all information society service providers to use such tools (while disingenuously only “suggesting” this as an example of possible tools).
The truth is that for complex, context-dependent assessments, including in relation to copyright, such tools do not work, which means they are fundamentally unsuited for the claimed purpose: They will lead to unacceptably high rates of
- false positives (wrongly blocking entirely legal material, or copyright-protected material that is used legally under an exception),
- or false negatives (failing to detect real illegal or copyright-protected material),
- or both.
The proper statistical assessment of such tools cannot be captured in simple “accuracy” or “success” rates (such as are claimed for the above-mentioned UK tool): In pattern recognition science, the outcomes should be measured in terms of precision and recall, rather than “accuracy”.
Moreover, such tools can of course be used to preventively detect, and then block, any pre-determined content. They are a gift to any government wanting to suppress the free flow and sharing of information on the internet. Not surprisingly, EDRi calls them “censorship machines”.
Automated algorithmic filtering tools perform by their very nature precisely the kind of “generalised monitoring” of the communications of whole swathes of populations (such as all users of Buzzfeed, or Vimeo, or Flickr, or Facebook, or Dropbox, to name but a few). According to the judgment of the Court of Justice of the European Union, this violates the very “essence” of the right to private life which is protected by the EU Charter of Fundamental Right. Automated algorithmic filtering tools are therefore considered fundamentally, constitutionally unacceptable and unlawful.
In sum: Such tools are both inappropriate and unsuited for their stated aim (they cannot achieve that aim in relation to context-dependent content) and constitute major and disproportionate – and thus unlawful – interferences with the fundamental rights of the people in the populations against which they are used.
This article summarises a longer paper, released on the EDRi website: https://edri.org/files/copyright/20180213-Korff-GeneralisedMonitoringOnlineContent.pdf (PDF).
Home Office unveils AI program to tackle Isis online propaganda (13.02.2018)
Copyright Directive Document Pool
The ghost in the machine
Civil society calls for the deletion of the #censorshipmachine (16.10.2017)
Infographic: Article 13 will harm European businesses, NGOs and consumers who upload and share content online on these services
(Contribution by Douwe Korff, EDRi member Foundation for Information Policy Research – FIPR, United Kingdom)