Datathon 2020 SolutionsDatathons Solutions

Datathon 2020 – Article recommendation


6 thoughts on “Datathon 2020 – Article recommendation

  1. 0

    Is this page final? It feels like a start. It starts very nicely, which great analysis, but then is feels like it was cut in the middle. Was anything actually implemented? Are there any actual results?

  2. 0

    Very nice ideas.
    it’s a pity you haven’t elaborated about it even further, plotting how you would approach it, and maybe how you would overcome the problems you’ve listed.
    For example
    – although Bulgarian has not enough resources or language models, one can use transfer learning from another richer language to it. Check out the work of Sebastian Ruder about this topic.
    – Or for the RNN – what features would you use? what new features would you engineer?
    – How would you combine your different approaches?

    It seems you are in a great direction, and having the correct state of mind. Please consider communicating more broadly and elaborately in the future.

  3. 0

    Hi, I’ve updated the article with the code and the idea explanation, although a bit late. Plots of the model achitecture, its training, and how the article popularity behaves over time can be uploaded too if needed.

  4. 0

    @preslav – I was late and then struggled with uploading the notebook (I was also not using notebook for the development).
    @liad –
    1) I think that with so much data, a feasable solution would be using the tags (because they represent the topic better than the title) and learning embeddings from scratch from the current dataset. Also I think the Transfer learning idea is possible.
    2) The RNN in the code is LSTM, with Contextualized Article embeddings as inputs (that is for the same article the embedding is different in time, because it depends on the current article popularity). The sequence is the last N articles read by the user. It can also be viewed as a user-preference “sessions”, as they continually over time.
    3) The added section tries to explain it. Basically, we combine popularity with content based information to form Contextualized Article embeddings. The the model is trying to predict the embeddings minimising the cosine similarity between the actual read news and maximising it between other popular news from the same day. So essentially we are learning a User-specific embedding space.

Leave a Reply