Popular articles by alberto

Popular comments by alberto

Datathon – HackNews – Solution – PIG (Propaganda Identification Group)

Hi guys. Good work and nice article. I have a question for you:
You mention that one problem for your model was the class imbalance (the fact that some classes have very few representatives). Where is the threshold for the frequency? I mean, how many instances of a class do you consider you would need in order to come out with a reasonable predictor?

Datathon – HackNews – Solution – Leopards

Hi guys. Good work and nice article. I have a few questions for you:
1. You say that you computed word2vec and added it to represent the sentence. Did you average the vectors for each word, or how did you do the combination?
2. I like it that in the evaluation section you tell us the impact of different features/decisions. Nevertheless, you do not provide any numbers to better understand such an impact.
3. It would be nice if you could package the software, rather than just pasting it here (I hope I did not miss the link!)

Datathon – HackNews – Solution – FlipFlops

Hi guys. Good work and nice article. I have some questions for you, all regarding task 3:

1. You mention that, due to overlapping, you opted for running multiple binary classifiers. Did you consider to try multi-task learning? Any idea what would have been the outcome?
2. You say that you only obtained good models for 2 techniques. May I ask for which ones?
3. You report F-measures of 0.25 and 0.35. May I assume this is for the singleton tasks of spotting a propagandistic event and then classifying it with one of the techniques? Otherwise, any justification for the huge drop wrt the test set?

Detecting propaganda on sentence level

Hi guys. Good work and nice article. I have some questions for you:
1. In the loaded language subsection you mention that you generated a list of phrases, but give no further details. Where did you get them from? How did you pick them? Is it part of the release?
2. I appreciate the narrative of the different subsets of representations and learning models, but I miss a table with numbers. (You only tell that the performance worsened or improved). What are the exact numbers? What performance did your baseline or the other methods get? Not sure if this is supposed to appear in Section 5, but I cannot see it.