Detecting Propaganda on Sentence Level
Team Astea Wombats
mkraeva, backslash, givanov, jorro, apetkov, fire
Source Code: hack-the-news-master
The code repository structure is described in its README.md file. To summarize, it contains all our models, feature transformer functions and our own pipeline for training, testing and validating our models.
The issue of propaganda is among the most pressing dangers of the current political landscape on a world-wide magnitude. The great amount of accessible information, the diminishing means of checking its veracity in up-to-date fashion and the accelerating loss of attention span among the general public all sum up to a field ripe for misinformation, polarization, tribalization and opinion engineering. Traditional media are at disadvantage as the efforts historically involved in veracity checking do not pay off economically in the landscape that user-generated media is creating.
Using technological means for detecting misinformation and manipulation is therefore seen as an important field of research and one that aligns closely with the public interest and the advancement of informed democracy. Independent and fact-checked information is crucial to the success of any contemporary society. It is technological evolution that has rapidly contributed to the deterioration the information landscape and it is therefore a moral imperative for companies, communities and individuals involved with tech to help combat this perilous development.
It is also important to narrow down particular issues with current media. While misinformation in general is problematic, conscious propaganda and the systematic incitement of particular opinions and emotions in a community deserves a focus on its own. This is because it has historically been an important tool for elites, populists and large-scale economic actors to distance public opinion from their own shortcomings and to distort a correct and informed perception of political reality and hence the democratic process in general.
The data used for this research is provided by the Qatar Computing Research Institute and consists of about 300 articles annotated either by article or at sentence level as propaganda or not. Additional information for the data creation process can be found in .
1. Business Domain Understanding
Propaganda is information that is not objective and is used primarily to influence an audience and pushing an agenda. Propaganda is the deliberate spreading of ideas, facts, or allegations with the aim of advancing one’s cause or of damaging an opposing cause. There are at least 18 types of propaganda. 
In this analysis, we do not take into account the more detailed annotations on propaganda types and instead focus only on the presence of any type of propaganda.
2. Data Understanding
The analyzed training data consists of 293 articles, annotated as propaganda or non-propaganda on sentence level. Of these, there are 12 articles that contain no propanda examples and 281 articles with some propaganda sentences in them.
Excluding empty lines (which are always classified as non-propaganda), the articles contain a total of 14265 sentences, of which 3940 as tagged as containing propaganda.
The median length of sentences containing propaganda is 12, while for sentences without propaganda this value is 9. As the following plot shows, propaganda sentences are longer on average.
We have also analysed the most commonly found words in propagandistic vs non-propagandistic sentences, after removing stop words using the available nltk corpora for English stop words.
This mertic shows that it is common for propagandistic sentences to mention groups and individuals that are known to take part in arguments and/or strongly defend their opinions. Examples in the data are the frequent appearances of words such as “Trump”, “god”, “church”, “catholics”, “papa francis”.
3. Data Preparation
Article files were already split with one sentence per line. We removed empty sentences because they introduced even larger class imbalance. Where necessary, we have also removed stop words from the sentences – e.g. in extracting word2vec embeddings and in calculating some features related to counting classes of words in the sentences.
We have invested a lot of effort in extracting low level (e.g. part of speech) and high level (e.g. subjectivity, readability) features.
The longer the sentence, the more complex it is for reading. Such sentences can be more confusing for readers and may be intentionally vague.
Counts of different Parts of Speech:
Adjectives and Adverbs
Research has shown the presence of adjectives and adverbs is usually a good indicator of text subjectivity.  In other words, statements that use adjectives like “problematic” and “incredible” might be more likely to convey a subjective point of view than statements that do not include those adjectives.
Proper Nouns and plural proper nouns
Proper nouns may be signifying various kinds of propaganda. Example include “appeal to authority” where popular figures are quoted, slandering a political opponent, flag-waving and patriotic feeling incitement, where nations or community groups may be cited with plural proper nouns, appeal to fear, etc.
Exclamation marks express strong emotions such as joy, enthusiasm, disbelief, surprise, or urgency. These strong emotions contribute to exaggeration which is common in propaganda texts.
Question marks can be an indicator for questioning the credibility of someone or something in a propaganda text. The text might be conveying doubt to its readers.
Using specific words and phrases with strong emotional implications (either positive or negative) to influence an audience is common in propaganda. We have compiled a list of such phrases and count their frequency in the sentences.
Anything objective sticks to the facts, but anything subjective has feelings. Usually, subjective means influenced by emotions or opinions. It is common for propaganda texts to be biased and subjective.
We use TextBlob to extract this feature. TextBlob uses a pattern library with a dictionary of words which make the text subjective (e.g. great, awful, etc.). It also accounts for intensifier words like ‘very’ and ‘much’, and polarity changing words like ‘not’. The subjectivity metric varies from 0 to 1, where 0 means that the text is objective and 1 that the text is subjective.
The polarity metric measures how positive or how negative the sentiment is in the text. We use TextBlob to extract it. It is a negative number between -1 and 1. Some propaganda texts can be extremely positive or negative. In our models, like the stacking ensemble, we rescale the range to [0, 1].
Emotion features are extracted from IBM Watson Natural Language Understanding API . They include sadness, joy, fear, disgust, and anger. Propaganda texts can be very emotional. It is common to see a notion of fear or anger in propaganda texts.
A popular kind of propaganda involves using carefully selected words with ambiguous or confusing meaning. We implement a proxy for this feature of the sentence by checking the sum of meanings of any word (grouped by part of speech) or its synonym nest in the sentence, by using WordNet data.
We use a selection of popular readability measures, such as SMOG, Fleisch-Kincaid, and others.
We used TF-IDF for our baseline model and we also combined it with other features to boost model evaluation performance.
Word embeddings help us model the semantic meaning of words.
A baseline model is useful to determine how much a more advanced model can contribute to improving the overall prediction accuracy. For our baseline model, we used a Logistic Regression with TF-IDF vectors as features for the model.
We have used Logistic Regression, SVM, Random Forest and Feed Forward Neural Networks as models with various combinations of the features we have extracted.
We used a pre-trained Word2Vec word embedding model from Google News with
gensim and we averaged the word vectors inside the article sentence. This average is then fed into a Logistic Regression. This was the first model which improved our baseline.
We also tried using a Feed Forward Neural Network instead of Logistic Regression, but didn’t have enough time to make it work.
We trained a Random Forrest with simple lexical features like the number of adjectives, adverbs, singular and plural pronouns, questions, exclamation marks and periods. But the result was worse than Word2Vec with Logistic Regression.
After that we included the Readability feature along the lexical features and the results were worse.
Finally, we tried Random Forrest only with Readability and results were the worst.
BERT , or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations. BERT obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. We used the Bert uncased pre-trained model (on the 12-layer architecture) and fine-tuned it for our task. The standalone BERT model gave F1 score for the positive class (‘propaganda’) of around 0.59.
The hyper-parameters used are as follows:
“do_lower_case”: True, # “Whether to lower case the input text. Should be True for uncased models and False for cased models.”
“max_seq_length”: 64, # “The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded.”
“train_batch_size”: 16, # “Total batch size for training.”
“eval_batch_size”: 8, # “Total batch size for eval.”
“predict_batch_size”: 8, # “Total batch size for predict.”
“learning_rate”: 5e-5, # “The initial learning rate for Adam.”
“num_train_epochs”: 3.0, # “Total number of training epochs to perform.”
“warmup_proportion”: 0.1, # “Proportion of training to perform linear learning rate warmup for. E.g., 0.1 = 10% of training.”
“save_checkpoints_steps”: 1000, # “How often to save the model checkpoint.”
“iterations_per_loop”: 1000, # “How many steps to make in each estimator call.”
Our best model is a Stacking ensemble. The stacked models include:
- a model using the TF-IDF features
- a model using word2vec embeddings
- a model using BERT with a softmax classification layer
- a model combining polarity and subjectivity features
- readability features
- lexical features
- emotions features
All models except the one using BERT are Logistic regressions.
Bert was trained for 3 epoches and was plugged alongside our other hand-crafted features. For our meta-learning model we have also used Logistic Regression. The input of the meta model when training is prediction probabilities of all the base models and the respective gold labels. You can see the ensemble in
We use Confusion Matrix based metrics like Accuracy, Precision, Recall and, F1 score.
BERT acquired F1 0.59 for the DEV set.
The following table shows the standalone performance of the base models on our validation set. All of the models used Logistic Regression classifier with l2 penalty and inverse regularization C=0.8.
Train set size: 11380
Validation set size: 2807
|Subjectivity and polarity||.5996||.2977||.4574||.3606|
Our proposed solution depends on several external resources:
- Emotion features are extracted from IBM Watson Natural Language Understanding API. You need to have an IBM Cloud account with a NLU project. Instructions on the required configuration can be found in the repository README.md
- We use pretrained word2vec embeddings from GoogleNews-vectors
- We use several nltk corpora for stopwords, POS tagging and WordNet
Topic Modelling is a technique which would help us to identify if the topic inside an article sentence changes from one to another, in order to detect introduction of irrelevant material. This would help us to identify the Red Herring propaganda technique.
Sentence location inside article
If we had more time, we would investigate if propaganda sentences occurred in specific locations in the article, e.g. in the beginning, middle or the end.