Datathon 2020 SolutionsRecommendation systems

ACES solution to article recommender engine case – provided by NetInfo


7 thoughts on “ACES solution to article recommender engine case – provided by NetInfo

  1. 0

    Hello, very nice work and nice video too.

    1. It seems that the evaluation results are missing – I can see the code, but not the actual output. How well did your approach do, based on the historical data?
    2. I’m a bit concerned about the negative sampling approach – couldn’t it also penalize possible links between users and articles?
    3. Given that this model goes into ‘production’ and now your client asks you to improve the model accuracy even further, and bring more clicks. Which directions would you take to improve it?

    1. 0

      Hi, Thank you for the feedback!
      1. The accuracy estimated for 12th May is 39%. You can find the final results for 13th May here – . I can also upload the results for 12th of May if you believe that they will be useful.
      2. I agree. It could penalize non-observed, but possible future links learned through the collaborative filtering algorithm. I’m not worried about the impact on content based features as in their case the flag is “truly” negative. However, about the collaborativ filtering algorithm, to mitigate this issue we’ve used short embeddings for the embedding layers, to generalize as best as possible the behavior. We believe that this generalization of behavior on visitor + article level will mitigate any negative effect on the final ranking – e.g. the probability estimate will be biased by the negative sampling, but the ranking of the articles (which is most important) given appropraite embedding length will not be.
      3. Currently we don’t use the most recently read articles for context, while the literature shows that the most recently read articles are a source of sigificant information about the most likely “next best article” to click. This information is traditionally incorporated through markov chain relationship or through recursive neural networks. However, what I want to test is to supply the last 10-20 context articles and 10-20 negative sampling articles and assess them through a softmax function processed on customer level. This is similar to the approach described in the “chameleon” repositori referenced in our Article. Another possible improvement could come from the “recency” features that we use and the “hot nws” features. We’ve used very basic representation while for “recency” we can use some of the approaches proposed by us above and for “hot news” we can implement features that are dependent on a longer history.

  2. 0

    Hi All,
    We noticed that the attached files in the article are not the correct one. They are only 2, while finally we have 25 files with ~2 MB size. We don’t want to change the article as it will move it back to “DRAFT”, so please refer to the results here: when evaluating the model.

    The accuracy of the model on 12th May is 39%. However, we also used this day for training, so the estiamte might be biased.

    The accuracy is estiamted following the approach discussed in the DSS chat for the case:
    1. First we count the total number of users experiencing article interaction with the user-specific “next best article” proposed by the recommendation system (only one article is allowed per user). E.g. – if a user has clicked on 6 article in the next day we should predict only one of them to count this as “1” regardless of their order in that day.
    2. Then we count the number of visitors present in both train and test that have interacted with an article that is present in the “train” set. E.g. – If a user has clicked on 1 article the next day, but this article was not present in the “train” dataset we ignore it. However, if an user clicks on 25 articles and 20 of them were present in the “train” set then we count this as “1”.

    Finally we use “1.” as numerator and “2.” as denominator to derive the evaluation metric:
    eval_metic = “Count of 1.”/”Count of 2.”

  3. 0

    Nice work and nice video.
    A few things I am curious about:
    * Have you looked at how the two parts of your algorithm behave on their own: I.e. if two titles are deemed close by the algorithm – are they really so to a human?
    * In the same vein to liad’s third question above – say you had the actual articles – how much of a change to your algorithm would this entail?


    1. 0

      Hi, Thank you for the feedback!

      1) If you are refering to the tf-idf + truncated svd representation of the titles. We checked several cases and indeed similar vectors (by cosine similarity) are simialar for a human. As for testing the algorithm using only “Content Based Features” or only “Collaborative filtering”. We didn’t have time to check this, but some preliminary tests using only Neural Collaborative Filtering weren’t as good, so my intuition is that the main added value is currently comming from the “Content Based Features” and the Neural Network transformations afterwards + Negative Sampling

      2) I would refrain from introducing the text of the articles for the time being, as in my mind other sources of data as the N most recent articles or the recency and hot news definitions would result in much higher uplift. Our working assumption is that the Title of the article is sufficient summary of the article contents, but we haven’t done any alaysis to validate that.

  4. 0

    Hi, all,
    Great video – figures & explanations! And the model architecture looks logical and appropriate for the goal.
    About the introduction of negative values, maybe it would be useful first to build quick & dirty model in order to isolate a set of less likely events and then to assign negative value to them. Of course, this is again a kind of speculation, but I think is better than the random approach.
    It is a pity that we can’t see the results easily…

    1. 0

      Hi, Thank you for the feedback and for the suggestion!
      I agree that a more target approach for creating the negative sample will lead to better results. In this case we just settled for the random approach as it was easy to implement and was inspired by the chameleon repo referenced in the article.

Leave a Reply