|Country of origin?||
Popular articles by mkraeva
Popular comments by mkraeva
I agree on point 1 and 2 – there is definitely need for more data visualizations than we have here. We could expand our article in that direction after the datathon.
We actually first ran the most common words analysis without removing stop words, but – as expected – the top words in both propaganda and non-propaganda sentences were tokens like “the” and “an”. We decided they could not carry much (if any) predictive power for our task and removed them. It will be interesting to see if other teams used stop words as a predictor and achieved any good results!
Thank you, @alberto!
Regarding the loaded language phrases, we searched the Internet for examples of such expressions. We found several promising resources – there was significant overlap in the listed words, but each article added some new words as well. These are all the lists we used:
The final list of words and phrases is included in the code repository, it can be found in `data/external/loaded_language_phrases.txt`.
As for the exact scores of our separate models, unfortunately we didn’t have time to produce a table with all of them before submitting our article. We’ll try to do so before today’s deadline.