|Country of origin?||
|For how many years have you been experimenting with data?||
Popular articles by jopfeiff
Popular comments by jopfeiff
This is very difficult to define because, as always, it depends. In this case also on the difficulty of the problem. The easier a class is to predict, the less data you need. Much more important is however to have unnoisy class labels, which in this data set often didn’t seem to be the case. I would suggest to have more annotators and calculate the interannotator agreement. It seems to me that different annotators have worked on each document separately and often the understanding of the task between annotators was different, which lead to noisy labels.
The whole rep which also includes the winning model can be downloaded here:
The winning model can be found in `resources/taggers/best_model/*` which was used for both the dev and the test prediction
Hello Preslav, I did not upload any data because I was not sure if I am allowed to upload the datathon data. I will make my whole repository including the data and embeddings available in google drive rep