Datathon 2020 Solutions

Case NetInfo/Vesti.bg article recommendation — Team Army of Ones and Zeroes — Datathon 2020

Team Army of Ones and Zeros solving the “Netinfo/Vesti.bg” Case for Datathon 2020.

1
votes

12 thoughts on “Case NetInfo/Vesti.bg article recommendation — Team Army of Ones and Zeroes — Datathon 2020

    1. 0
      votes

      Hi zenpanik,
      The copy/paste lost the formatting and pictures. We wrote everything on a Google Doc (see below) which has pictures and fixed-font formatting. But even with the formatting you are right – Perl is hard to read. We tried to add comments, but the main goal is to make it work, and Perl is very fast and easy for prototyping (at least for old-timers like me :-).
      Here are the links to the pretty google docs and PDF:
      https://docs.google.com/document/d/186Bcv4DbrYLY7m3ZCeGji9QM7DAjm0arY2doV5TTNAs/edit?usp=sharing
      Also in PDF format:
      https://drive.google.com/file/d/1eFiw4lqNtMXAbQZLwhLPBCOyODPQY4oU/view?usp=sharing

  1. 0
    votes

    “Our best idea is to see if there is a way to see if some users strongly prefer some types of articles and maybe the preference will be so strong that they would be way more likely to read the 2nd and 3rd best articles, if their affinity for the 1st best article is low.”

    That’s a nice idea – but how would it be implemented?

    1. 0
      votes

      We didn’t have time to test that. So at this point it is just a hypothesis (guess) that needs to be tested. To be clear, we are somewhat confident that it will NOT help FOR THIS PARTICULAR DATASET. (There are clearly many scenarios where it will)
      The way to test it would be erase the last day of data (or last 2 days, or 3, etc) and use the erased data for evaluation.
      The only other thing needed is some model for what “types of articles users like”. For that we can use words from the title of the article, or the natural categories that vesti.bg has (like coronavirus, bulgaria, sviat, etc), or some other way to model user preference.
      With the above two, once can measure the difference in the objective (accurately predicted user article visits for the next 24 hours) and see which one is better and by how much (and whether it is statistically significant).
      We didn’t have time to do all of that 🙂
      But we did estimate/guess/wager that it won’t improve significantly our prediction score, so we prioritized it lower and ultimately didn’t do it.

  2. 0
    votes

    I do like your data exploration, based on the time of the day, and the recurrent visits.
    Yet, I am missing an exploration by the user, or similar articles by similar users.

  3. 0
    votes

    Overall, I like tis article for the analysis that was done on the data and for pointing to interesting ideas based on the observations. This is how data science works: we have to look at the data! This team is well ahead of any other team on that, and I congratulate them for this! There are tins of nice things like decay over time, observation when this happens, how long it lasts, etc.

    They have not advanced as much with formulating how this would be operationalized, e.g., what model to use exactly and what does it do — it is collaborative or content-based filtering? And their ideas for testing are too high level (A/B testing, ok but how about some evaluation measure too)?

    Yet, they do find some very nice insights, which can help a lot future work on this dataset.

Leave a Reply