1. Business Problem Formulation
The current political landscape is shaped by extreme polarization of opinions and by the proliferation of fake news. For example, a recent study published in Science has found that rumors and fake news tend to spread six times faster than truthful information. This situation both damages the reputation of respectable news outlets and it also undermines the very foundations of democracy, which needs free and reliable press to thrive. Therefore, it is in the interest of the public as well as of the news organizations to be able to detect and fight disinformation in all its forms. While most previous work has focused on “fake news”, here we are interested in propaganda.
Propaganda is the deliberate spreading of ideas, facts, or allegations with the aim of advancing one’s cause or of damaging an opposing cause. While it may include falsehoods, this is not really necessary; rather, propaganda can be seen as a form of extreme bias. Yet, propagandistic news articles usually use certain techniques such as Whataboutism, Red Herring, and Name Calling, among many others. Detecting the use of such techniques can help identify potentially propagandistic articles. Here, we ask you to create a tool that can help with this endeavor.
Note that any news article—even such coming from a reputable source—can reflect the unconscious bias of the author, and thus it could possibly be propagandistic. Therefore, it is important not only (i) to flag the article as a whole, but also (ii) to detect the potentially propagandistic sentences in a news article, and even (iii) to identify the exact type and span of use of propagandistic techniques. You will be provided with a set of articles taken from online news sources, as well as with annotations of the span and type of the propagandistic techniques used in each of them. Your task will be to develop a tool to perform tasks (i), (ii) and (iii).
2. Research Problem Specification
Undoubtedly, propaganda detection is a difficult problem. While obvious propaganda might only fool a limited number of people, subtle propaganda can potentially go unnoticed even by the most careful readers. This raises the natural question: How can we expect computers to recognize propaganda if even humans often fail to do so? And how good AI get in this? In order to shed some light on these questions, we have defined three tasks in increasing level of difficulty.
- Propaganda detection at the article level (PAL). This is the easiest task, albeit not easy in absolute terms. It is a classical supervised document classification problem. You are given a set of news articles, and you have to classify each article in one of two possible classes: “propagandistic article” vs. “non-propagandistic article.”
- Propaganda detection at the sentence level (PSL). This is another classification task, but of different granularity. The objective is to classify each sentence in a news article as either “sentence that contains propaganda” or “sentence that does not contain propaganda.”
- Propaganda type recognition (PTR). This task is similar to the task of Named Entity Recognition, but applied in the propaganda detection setting. The goal is to detect the occurrences and to correctly assign the type of propaganda to text fragments. The types of propaganda you should target and their definitions with examples can be found in the following document: http://propaganda.qcri.org/annotations/definitions.html
Each task is built around a separate annotated dataset, split into Train, Dev, and Test. You should train your model on the Train set. The evaluation during development time should be conducted on the Dev set. The final evaluation will be on the Test set, which will be kept hidden from the participating teams. The evaluation measures for the model performance are defined as follows:
- For PAL and PSL, we will use F1 score to evaluate the quality of the participating systems. The computation of the F1 score will be done as follows: First, we compute the precision P as the ratio of the number of true positives (documents/sentences that the system has labeled as propagandistic and they are indeed such) and the total number of documents/sentences that the system has classified as propagandistic. Second, we compute the recall R as the ratio of the number of true positives and the number of all documents that are indeed propaganda. Then, F1 = 2PR / (P + R). This measure takes into consideration the class imbalance in the testing dataset.
- For PTR, we borrow a measure from the task of Named Entity Recognition (NER). NER typically uses a strict F1 score, which only acknowledges a success if the match is exact. This is too harsh as it gives no credit to partial matches, and thus we will use a more relaxed version of macro-averaged F1.
3. Data Description
Two sets of documents will be provided: one for PAL and another one for PSL and PTR. Below we show a sample snippet of a propagandistic article in order to illustrate the PSL and the PTR tasks (the same article for PAL would appear on one single line with newlines replaced by two-spaces). Line numbers ([n]) have been added for explanatory purposes, but they will not appear in the files. Each article has an id, and in this example, we assume that the article’s id is 123456.
 Manchin says Democrats acted like babies at the SOTU (video) Personal Liberty Poll Exercise your right to vote.
 Democrat West Virginia Sen. Joe Manchin says his colleagues’ refusal to stand or applaud during President Donald Trump’s State of the Union speech was disrespectful and a signal that the party is more concerned with obstruction than it is with progress.
 In a glaring sign of just how stupid and petty things have become in Washington these days, Manchin was invited on Fox News Tuesday morning to discuss how he was one of the only Democrats in the chamber for the State of the Union speech not looking as though Trump killed his grandma.
 When others in his party declined to applaud even for the most uncontroversial of the president’s remarks, Manchin did.
 He even stood for the president when Trump entered the room, a customary show of respect for the office in which his colleagues declined to participate.
There are several propaganda techniques that were used in the above article:
- on line 1, the fragment “babies” is an instance of Name_Calling and Labeling
- on line 2, the fragment “the party is more concerned with obstruction than it is with progress” is an instance of Black_and_White_Fallacy
- on line 3, the fragment “stupid and petty” is an instance of Loaded_Language;
- on line 3 “not looking as though Trump killed his grandma” is an instance of Exaggeration and Minimisation
- on line 3 “killed his grandma” is an instance of Loaded_Language
In the next subsections, we explain the data format for each task.
The training data for the PAL task consists of about 60k news articles, annotated as either “propagandistic” or “non-propagandistic.” The annotation was done indirectly using a technique known as distant supervision, i.e., an article is considered propagandistic if it comes from a news outlet that has been labeled as propagandistic by human annotators. For example, all articles from http://Inexistent-Site-Supporting-Brexit.co.uk would be labeled as propagandistic, even though some of them may not be such, which could potentially introduce noise in the dataset. It has been argued elsewhere that models can deal with such noise if the training data is large enough.
The input data will be presented as one big tab-separated file, where
- Each line of the file corresponds to a single article.
- Each new line (carriage return) in the original article is converted to two spaces.
- Besides the full text of the article (first column), each line has two additional TAB-separated columns, the unique article id and its gold label, which could be “propaganda”, “non-propaganda”, or “?”. The latter would be used for the Dev and the Test datasets.
Your system should produce a tab-separated file that contains two columns: [article id] and [propaganda/non-propaganda].
The training data for the PSL task consists of 451 news articles with sentence-level annotations. Each sentence is annotated as containing propaganda or not. The dataset is split into Train, Dev and Test parts.
For each document, there will be two files: PSL-1 and PSL-2.
PSL-1: The title of an article will appear in the first line and the contents will appear from the second line onward, one sentence per line. The first sentence, which is on the second line, will have id 1.
PSL-2: Moreover, you will be presented with one tab-separated file for each news article containing three columns with annotations in the following format:
[article id], [sentence id], [propaganda / non-propaganda / ?]
Your system should generate output in the same format as the tab-separated input file, but replacing “?” with “propaganda” or “non-propaganda”. Here is an example referring to the article above:
123456 1 propaganda
123456 2 propaganda
123456 3 propaganda
123456 4 non-propaganda
123456 5 non-propaganda
The training data for the PTR task consists of the same 451 news articles, but this time the annotations will mark the span and the type of propaganda technique that has been used. As for the other two tasks, the dataset will be split into Train, Dev, and Test parts.
For each document, there will be two files: PTR-1 and PTR-2.
PTR-1: This file will be in the same format as for the PAL task.
PTR-2: Moreover, you will be presented with one tab-separated file for each news article containing four columns in the following format:
[document id], [starting_span], [ending_span], [label]
where the label is one out of the 18 possible propaganda techniques. The span will mark character-level offset.
You are expected to produce your classifications in the same output format as in this second file. Here is an example referring to the article above:
123456 34 40 Name_Calling
123456 34 40 Labeling
123456 299 368 Black-and-White_Fallacy
123456 400 416 Loaded_Language
123456 607 653 Exaggeration
123456 607 653 Minimization
123456 635 653 Loaded_Language
4. Expected Output
In addition to the above-described output files, which will be evaluated by an automatic evaluator (see Model evaluation), the participants are asked to write an article or a Jupyter notebook in the Data Science Society authoring tool to document their solution, which as a minimum should include the following:
- One paragraph description of the approach/implementation
- List of external libraries and resources (if any)
Participants that do not provide any documentation for the solution will not be considered for the final ranking, while participants with very close scores on the test datasets may be reranked based on the quality of their documentation.
The registration is free but mandatory – Join before 21. January!