Detecting propaganda on sentence level

Hi zenpanik! Regarding 4., our best result on the DEV data set was an F1 score of 0.5979 using a Stacking ensemble with all our hand-crafted features, TF-IDF, word2vec, but without BERT in the ensemble. Our final submission for the TEST data set was with BERT in the ensemble as well.