This article describes our submission for the Hack the News Datathon 2019 which focuses on Task 2, Propaganda sentence classification. It outlines our exploratory data analysis, methodology and future work. Our work revolves around the BERT model as we believe it offers an excellent language model that’s also good at attending to context which is an important aspect of propaganda detection.


Datathon – HackNews – Solution – DataExploiters

    The article is well-structured and motivates convincingly the choice of Task 2 as well as the utilized research steps. Results are presented and explained in a clear fashion.

    I wonder if you considered correcting for class imbalance in the dataset. It usually improves F1 by some 2-3 percent at least. Have you noticed the predominance of class 0 (non-propaganda)?

      Thank you for your input, Laura. Indeed, as mentioned in the article, 72% of the sentences are non-propaganda and we plan to correct for this by appropriate sampling. This is something we noticed early but failed to account for while the leaderboards were active.

