Gold Standard Corpus Identifies Propaganda in the News via AI

As a key partner of the first global Hack the News Datathon and of the Qatar Computing Research Institute’s (QCRI) Propaganda Analysis Project, A Data Pro is working on creating a Gold Standard Corpus (GCS) that will, in the near future, serve to train a model to identify media propaganda.

Since September 2018, a dedicated team of A Data Pro media analysts has been evaluating articles containing propaganda, annotating them according to a set of 18 predefined propaganda tools, such as repetition, exaggeration, and whataboutism.

Among others, the topics of the stories selected for the GSC include some of the most frequently misrepresented and skewed events in recent history, including the Qatar blockade of 2017, the so-called “red scare” in 50’s USA, the Las Vegas shootings of 2017, Russia’s involvement in the US elections, and the Cambridge Analytica scandal.

A Data Pro’s team has so far spent 400 hours annotating, with this phase of the project to be completed by March 2019. Thanks to this effort, guided by QCRI scientists’ domain expertise, a dataset of annotated articles was created especially for Data Science Society’s (DSS)  Hack the News Datathon, which takes place between 21 and 27 January 2019 on the DSS platform.


A Data Pro and QCRI’s dataset will be open for use as part of upcoming datathon. The competition invites the global data science community to explore the pertinent issue of media propaganda and apply their data science skills and knowledge to solve it with the help and supervision of top NLP researchers.

Ultimately, the Hack the News Datathon and the Propaganda Analysis Project aim to create a tool that automatically detects, analyses and organises news articles in terms of their propagandistic content. The resulting system will help organisations, individual readers, and journalists reach informed, media bias-free decisions.

The best solution submitted at the Datathon will be presented on 29 January, while the results of the Propaganda Analysis Project will be unveiled at a workshop in Hong Kong in November 2019, as part of the annual Conference on Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).

You can contribute to A Data Pro, QCRI and DSS’ cause by participating in the Hack the News Datathon or making a donation toward the crowdfunded award.  Join the Datathon!

Share this

Leave a Reply