Datathon – HackNews – Solution – PIG (Propaganda Identification Group)

This is very difficult to define because, as always, it depends. In this case also on the difficulty of the problem. The easier a class is to predict, the less data you need. Much more important is however to have unnoisy class labels, which in this data set often didn’t seem to be the case. I would suggest to have more annotators and calculate the interannotator agreement. It seems to me that different annotators have worked on each document separately and often the understanding of the task between annotators was different, which lead to noisy labels.