Datathon – HackNews – Solution – FlipFlops

Catching fake news or types of propaganda is highly essential open source cause to which would like to contribute. Hence, what follows next would be our study case for the 2019 Datathon – Hack the News.


10 thoughts on “Datathon – HackNews – Solution – FlipFlops

  1. 1

    Very thorough analysis, I like that you looked into all three tasks. The approaches are very reasonable, and I wonder if you have any explanation why your Task1 and Task2 models were not among the top scoring.

    1. 0

      Hi Laura,

      Thank you for your comment. It’s great to see that you have found time to review it.

      Unfortunately regarding Task 2 we did not manage to apply ‘Component 3’ part due to data preparation issue on the test set. In the paper we described all the models we had built and trained, but unfortunately we didn’t apply all of them on the test sample. To our knowledge, this could lead to the poor performance on task 2.

      Thank you for participating in the event as a mentor and expert J

      For us as a team it will be very beneficial and highly appreciated if you share your thoughts for improvement and what you would do differently.


    2. 0

      Hi! Thanks for the kind words!
      I believe that the reason our model under-performed on Task 2 is that we ran out of time and couldn’t make the standard checks for over-fitting. By the time we had the final ensemble, the DEV set was already offline, so we couldn’t really see its performance on an out-of-sample data. This meant that we either had to go with the best single component (evaluated on DEV) or take a leap of faith and submit the ensemble on TEST.
      We decided that whether we win or lose, we’ll do it as a team, so we went with the combined model. Sadly, the ship sank with everyone on board 🙂

  2. 0

    Hi Laura,

    Thank you for the good words. I am happy to understand that our work is appreciated.

    For Task 1, I am thinking about the following things, that could lead to better performing model.
    We didn’t played enough with the hypeparameters and the architecture of the neural network.
    For example, the text is padded to the mean number of tokens in article, which means that for half of the articles, some information is dropped.
    If the padding parameter is adjusted, we can feed more information to the neural network.
    On the other hand, the LSTMs are not working very well with long sequences, so experimenting with different layers here could be beneficial.
    For example, an attention mechanism added to the LSTM layer could be tested here.

    Using different types of text representations, models and feature engineering, should explain and catch different connections in the texts.
    Similar to the approach we have in Task 2, creating a stacked ensemble should give a better performing final model.

  3. 1

    Really great article, great analysis, and great in modeling the task in a way that makes a lot of sense. The features tried should help further research on the problem.

    1. 0

      Thank you very much for your comment. It’s great to understand that our work could help further the research.
      Regarding the features, we were influenced by your presentation in front of DSS community half a year ago. By far every datathon we participate, we try to keep good balance between performance and interpretability of the models.

  4. 0

    Hi guys. Good work and nice article. I have some questions for you, all regarding task 3:

    1. You mention that, due to overlapping, you opted for running multiple binary classifiers. Did you consider to try multi-task learning? Any idea what would have been the outcome?
    2. You say that you only obtained good models for 2 techniques. May I ask for which ones?
    3. You report F-measures of 0.25 and 0.35. May I assume this is for the singleton tasks of spotting a propagandistic event and then classifying it with one of the techniques? Otherwise, any justification for the huge drop wrt the test set?

    1. 0

      Hi Alberto,

      Thank you for you questions. Our answers are as follow:
      1. We did not considered it. Not sure why, may be all of the chaos and stress to organize the tasks and start quickly to produce output has blind spotted us for this option.
      2. They are the 2 most populated – Loaded_Language and Name_Calling,Labeling
      3. You are correct, they are on the singleton tasks. When the models are applied one after other, the performance drop to significantly (pure multiplication). Our test set score was close to the dev and train-dev set (just 0.005 points drop). Something we have tried is to combined the 2 approaches. We have used the individual models for the top 2 propaganda models and combined them with the joint model of the other 16 techniques. Although we got better f1 scores for each propaganda type, the overall score was lower.

Leave a Reply