Datathon 2020 SolutionsDatathons Solutions

NewsCo: rapid non-parametric recommender algorithm for NetInfo news articles


Warning: DOMDocument::loadHTMLFile(): s3:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(s3://dss-www-production/uploads/2020/05/Article_final-1.html): failed to open stream: no suitable wrapper could be found in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity "s3://dss-www-production/uploads/2020/05/Article_final-1.html" in /home/keepthef/ on line 447

Warning: Invalid argument supplied for foreach() in /home/keepthef/ on line 452

Share this

8 thoughts on “NewsCo: rapid non-parametric recommender algorithm for NetInfo news articles

  1. 0

    Using a graph to visualize the mutual interest of the users in an article is actually quite similar to the collaborative filtering matrix. The ‘huge black hole’ community corresponds to the common interest of most of the users, probably in the COVID-19 articles.

    There are other usages of graphs, specifically using graph convolutional neural networks (GCN), which could bring you closer to the task of articles recommendation. Specifically, if both the articles and the users are nodes, and edges signifies reading of an article by a user, using algorithms for “link prediction” can quickly predict which articles are going to be the next “best thing”. For instance, by summing the predicted edges between users and a newly published article.

    Overall very nicely written, and a great, clear, video as well. Great job!

    1. 0


      Thanks for giving a feedback!

      We agree in terms of having collaborative filtering matrix and graph adjacency matrix as the same data structures.
      Also we share the idea of formulating this problem as a link prediction problem in graph theory, as previously I’ve been doing research in this field, especially on co-authorship networks (,,, However, again due to limited time and resources, we decided to apply graph theory just in terms of subset visualization, but not solving the target problem

      And thanks for positive evaluation of video!

  2. 0

    Well written article and a good video – nice work!
    I like the fact that you have devised and implemented you own entire take on the issue. Here are some questions/observations:
    * It seems to me that you are making some assumptions about the data when defining you statistics (publication time, etc.) that have the potential to greatly affect the result.
    * do you guard against recommending article that someone has already read?
    * your rating definition stresses the use of different bases – but that is a multiplicative constant for all ratings (hence little bearing on comparing different values) vs the same base formula (and your code seems to use the same base?) – could you expand this point further?

    Best regards,

    1. 0


      Thanks for feeback!

      Answering to your questions:

      1) All the assumptions done are plausible and we don’t see the potential affect on result, as for instance publication time is assumed to be the first click on article. I guess this point could be weak for some small tabloids, however NetInfo is big enough organization so that time between publication and first click is to be negligible
      2) Currently the approach allows recommending the article that user has already read. We agree that it’s fair point to disable such a possibility, and in fact it could be easily done by restricting the pull of ranked articles inside the suggested subtopic to having no intersections with already viewed articles – by the way, we’ve already written code for that issue in collaborative filtering approach, but being in hurry forgotten to apply it to final solution
      3) In this article I’ve unfortunately copied the first version of code for this particular moment where we had equal bases. However, the idea of making different bases logarithm is better scaling and linearization. So that as equal bases just introduce scaling property, applying suitable bases guarantees linear influence of both freshness and popularity in order to introduce equal contribution of these metrics to final rating – we suppose this assumption to be fair with respect to visitors obtaining both fresh and “on-hype” recommendations

      Thanks for positive evaluation of article and video quality!

  3. 0

    Hi, Vrategov, Pscience πŸ™‚
    I agree with Liad – your article & video are very informative, so it was easy for me to understand your work. And just to add to the other comments: what is missing for me is the actual content of the articles – this is in fact what readers are interested in… Keeping in mind the case formulation, I would focus more on providing the whole text of the articles and on the NLP part of the workflow. Nevertheless, I like the analysis, which you made in the beginning, the techniques you used, the recommendation on individual level, etc.

    1. 0

      Hello, thanks for your positive feedback.

      I found myself often clicking on news because of the title, and we see a lot of news agencies trying to catch the attention of the visitors through it. The title is aimed to give the main point of the article and we emphasise it would give us enough information about the visitor’s interests.

      Moreover, we argue that use of article’s text as an input is wrong strategy of model training. Looking at tabloid’s pages, maximal available information about article except from title is it’s subtitle. It means that article’s text is a hidden state, and the only reasonable approach with it’s usage is multi-layer model with hidden variables staying for article’s text and probability of user to be interested in this text. So that taking article’s title as an input, the joint distribution over hidden layers of text and target variable could be fitted – but not pretending a hidden variable to be visible one.

Leave a Reply