January 26, 2019, Sofia/Doha — The global Hack the News Datathon kicked off last night, gathering together more than 250 AI and data science academics, professionals and aficionados from over 50 countries to help develop a tool that can automatically identify propaganda in the news (winners to be announced on January 29, 2019). Unlike previous related efforts, it focuses on detecting the use of propaganda and specific propagandistic techniques, thus promoting explainable and accountable AI. The event is co-organized by the Data Science Society and the Qatar Computing Research Institute (QCRI), HBKU, and is hosted onsite in Sofia, Doha, Bangalore, and Riyadh as well as online via a dedicated platform. The effort is part of QCRI’s Tanbih project, which is developed in collaboration with MIT-CSAIL, and aims to uncover stance, bias and propaganda in the news, thus limiting the effect of “fake news”.
Recent years have seen the rise of social media, which have enabled people to share information with a large number of online users, without quality control. On the bright side, this has given the opportunity for everybody to be a content creator and has also enabled much faster information dissemination. On the not-so-bright side, it has made it possible for malicious users to spread misinformation much faster and potentially reaching large audiences. In some cases, this included building sophisticated profiles for individual users based on a combination of psychological characteristics, meta-data, demographics, and location, and then micro-targeting them with personalized “fake news” and propaganda campaigns that have been weaponized with the aim to achieve political or financial gains.
“Fukuyama could not have been more wrong when he predicted in 1989 the end of history and the triumph of liberal democracy. Bad actors are using fake news, propaganda, and disinformation to advance dangerous ideologies. Can we use AI to regain trust in journalism and weed out biased and untrustworthy news sources?” asks Viktor Senderov, datathon co-organizer.
Indeed, the spread of false information has become a global phenomenon, and 18 countries around the world have recently reported issues related to disinformation during elections. To get an idea of the scale, 150 million users on Facebook and Instagram saw inflammatory political ads, and Cambridge Analytica had access to the data for 87 million Facebook users. To realize the potential impact, consider the 2016 US Presidential elections and Brexit.
“Nuclear, biological and/or cyber warfare are most commonly cited our biggest threats. Reality is: we are already in the midst of an information war of Fake News and Opinion Manipulation that is far more dangerous.”, says Prof. Peter Cochrane, OBE, from the University of Suffolk, UK, member of the advisory board of the datathon.
Disinformation comes in different flavors such as “fake news”, bias, and propaganda, with the latter being one of the most dangerous forms. According to the Institute for Propaganda Analysis (active between 1937 and 1942), propaganda is defined as “The expression of an opinion or an action by individuals or groups deliberately designed to influence the opinions or the actions of other individuals or groups with reference to predetermined ends.”
Note that propaganda is related to but still different from “fake news”. While they overlap to a large extent, this overlap is nevertheless only partial. For example, it is possible to have propaganda based on true information (e.g., biased, cherry-picked), as it is possible to have false information that is not propaganda (e.g., no intent, no use of propagandistic techniques).
“As never before, spreading propaganda is at the fingertips of anybody, big or small business. Making people aware of it is crucial to reduce its impact.”, says Dr. Alberto Barrón-Cedeño, datathon co-organizer from the Qatar Computing Research Institute (QCRI), HBKU.
Indeed, an arguably good way to fight propaganda is to educate people to recognize it when they see it, which would help reduce its viral proliferation, e.g., through sharing in social media, but also its impact on users that have already received it. As Goebbels, the Minister of Propaganda of Nazi Germany, has put it: “Propaganda becomes ineffective the moment we are aware of it”.
“Having a practical tool that can detect the use of propaganda in the news is important and may impact the way readers consume news in the future.”, adds Dr. Giovanni da San Martino, datathon co-organizer from QCRI.
This is exactly what the first Hack the News Datathon, held on January 21-29, 2019, aims at: it asks participants to develop a tool to detect propaganda in the news. The datathon is coorganized by QCRI and the Data Science Society, a global community of people involved in Data Science, which has been behind several international datathons, including one on Fake News in 2017.
“We hope that this datathon will contribute to breaking the bad practices of how news are presently being produced and consumed. We stand behind the open-source culture, and thus all models developed during the datathon and the dataset will be made publicly available, so that they can be used by anyone interested, including researchers, businesses, and even individual news consumers.” says Sergi Sergiev, founder of the Data Science Society.
Unlike previous related efforts, which have focused on fact-checking claims, rumors or news articles, the datathon aims at spotting the use of propagandistic techniques. Indeed, the use of propaganda in the news is often hard to notice. Yet, things change if we look for specific propagandistic techniques such as name calling, whataboutism, causal oversimplification, loaded language, etc. There are over 60 known propagandistic techniques and the datathon focuses on 18 of them, which are arguably the most frequent and the most impactful ones.
“I expect that this datathon will start a very promising research direction in the fight against propaganda.”, says Dr. Preslav Nakov, datathon co-organizer from QCRI. “Similarly to fighting spam, fighting disinformation is an adversarial problem, where malicious actors constantly change and improve their strategies. Yet, the way they can adapt their message is limited, as effective propaganda requires the use of propagandistic techniques, and this is exactly where we strike.”
Focusing on the individual techniques offers two key advantages: it allows an Artificial Intelligence (AI) system both to explain to the user why an article is deemed potentially propagandistic (thus, enabling explainable and accountable AI) and also to educate users to recognize the use of propagandistic techniques in real-world news articles. It further relieves systems from the need to make a propaganda judgment at the article level, which is hard and potentially subjective as, under most definitions of propaganda, there is a need to prove intent. In contrast, detecting the use of propaganda techniques is much easier and objective.
“Propaganda in the news is ubiquitous, ranging from blatant to extremely subtle and effective. As it has in the past, it can lead to economic and social disasters. To fight it at scale, algorithms are necessary. Thankfully, the first annotated datasets of propaganda in the news are emerging. I am looking forward to the ingenious Machine Learning models that can reveal patterns useful for automatic detection.”, adds Laura Tolosi-Halacheva, datathon co-organizer.
The core team behind the task definition and the data preparation consists of five people. This includes three scientists from the Qatar Computing Research Institute, HBKU — Dr. Giovanni da San Martino, Dr. Alberto Barrón-Cedeño, and Dr. Preslav Nakov, who have a lot of experience in Natural Language Processing (NLP) and in fighting disinformation, as part of the Tanbih project. It further includes Dr. Laura Tolosi-Halacheva, an NLP researcher with experience in detecting rumors related to Brexit in social media, and Viktor Senderov, a PhD candidate from Naturhistoriska riksmuseet, is an NLP researcher and a winner of two past datathons.
The international datathon team is further supported by global media and business intelligence provider A Data Pro, which carried out the difficult task of manually annotating propaganda instances within the dataset specifically for the Hack the News Datathon.
The datathon has attracted over 250 participants from more than 50 countries. It is further supported by an advisory board of 15 well-known experts from leading institutions around the world, including MIT, University of Cambridge, University of Michigan at Ann Arbor, University of California at Santa Barbara, University of British Columbia, University of Sheffield, Technische Universität Darmstadt, University of Texas at Arlington, University of Suffolk, Qatar University, Max-Planck Institute for Informatics, ISI Foundation, Amazon and Full Fact.
“Having had the misfortune to be exposed to extensive propaganda during the communist regime in Eastern Europe in the 80s, I have experienced first-hand the disastrous effects it can have. I am very excited to see this effort to address propaganda head-on, which I expect it will have long term positive implications. I am grateful to the datathon organizers and all the participants for all their work to create resources and algorithms that can help us detect (and thus avoid) propaganda in news.”, commented Prof. Rada Mihalcea from the University of Michigan at Ann Arbor, member of the advisory board of the datathon.
On Sunday, 27 January 40 teams submitted their solutions to the case of propaganda identification at the Hack the News Datathon and 10 of them were selected automatically through a specially developed leaderboard to present at the Finals!
“Fake news and propaganda are threatening the truthfulness of Journalism, and they have become a central problem for all Internet companies. Moreover, it could lead to offline violence and national security concerns. I’m very happy to see QCRI, Datathon organizers, and Dr. Nakov taking the initiatives to tackle this important societal issue. Indeed, natural language processing technologies need to move beyond academia to create positive impacts on the society.”, said Prof. William Wang from the University of California at Santa Barbara, member of the advisory board of the datathon.
The winners of the Datathon and the best AI solutions to propaganda detection will be streamed and presented at the Official Closing of Hack the News Datathon on 29 January.
Data Science Society
The Data Science Society is a global community of people involved in the so-called Data Science and for four years has applied the expertise of its members to advance various social and business causes. Among the most significant achievements of the international organization with over 1,800 experts and enthusiasts are the development of tools to predict air pollution in the urban environment and the use of self-learning algorithms to identify fake news. The motto of these initiatives is that any social problem can be solved with the help of the whole society and the collective efforts when put in the right direction. All of these endeavors would not have been possible without the support of its partners, including WorldQuant, Telelink, Receipt Bank and A Data Pro.