Datathon 2020 SolutionsRecommendation systems

ACES solution to article recommender engine case – provided by NetInfo


Warning: DOMDocument::loadHTMLFile(): s3:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(s3://dss-www-production/uploads/2020/05/vesti_data_prep-.html): failed to open stream: no suitable wrapper could be found in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity "s3://dss-www-production/uploads/2020/05/vesti_data_prep-.html" in /home/keepthef/ on line 447

Warning: Invalid argument supplied for foreach() in /home/keepthef/ on line 452

Warning: DOMDocument::loadHTMLFile(): s3:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(s3://dss-www-production/uploads/2020/05/01.-Feature-Engineering-1.html): failed to open stream: no suitable wrapper could be found in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity "s3://dss-www-production/uploads/2020/05/01.-Feature-Engineering-1.html" in /home/keepthef/ on line 447

Warning: Invalid argument supplied for foreach() in /home/keepthef/ on line 452

Warning: DOMDocument::loadHTMLFile(): s3:// wrapper is disabled in the server configuration by allow_url_fopen=0 in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(s3://dss-www-production/uploads/2020/05/02.-Deep-Learning-Model-1.html): failed to open stream: no suitable wrapper could be found in /home/keepthef/ on line 447

Warning: DOMDocument::loadHTMLFile(): I/O warning : failed to load external entity "s3://dss-www-production/uploads/2020/05/02.-Deep-Learning-Model-1.html" in /home/keepthef/ on line 447

Warning: Invalid argument supplied for foreach() in /home/keepthef/ on line 452


The ACES team that worked on the solution is listed in alphabetical order:

Atanas Blagoev ([email protected])

Atanas Panayotov([email protected])

Emil Gyorev ([email protected])

Georgi Buyukliev ([email protected])

Iliana Voynichka ([email protected])

Slav-Konstantin Ivanov ([email protected])

Ventsislav Yordanov ([email protected])


Business Understanding

Even though the news is perceived as one of the most important sources of information to people in current times the information overload problem makes it more and more difficult for users to find news that they are really interested in. Thus, the personalized news recommendation technology has drawn more and more attention from all walks of life. It is the main task in current personalized news recommendation research to design the algorithms with both high recommendation effect and good performance by intelligently combining the existing personalized recommendation technologies with the unique characteristics of news.

As outlined in the current case documentation the main goal is to predict the next best article (not topic) for the visitor of, a Bulgarian news website.

The benefits of having a well-working recommendations engine include (but are not limited to):

  • Sustainable increase in the length of stay of the visitors (average session duration)
  • Reducing the bounce rate (the percentage of total visitors who abandon the website after visiting just one page)
  • Increase advertising revenue
  • Improve customer experience and satisfaction by helping them find what they need more quickly saving them time and effort.

Unlike the recommendation of items in the fields of e-commerce, tourism, movie, music, and so on, the design and application of personalized news recommendation techniques are more complicated and difficult because of the characteristics of news itself. Some of those characteristics are strong contextual correlation, rapid changes of popularity, strong timeliness performance, social impact factors, etc. as well as the relevance between news (news is not independent).

To address the business need ACES has prepared a model that is structured in a way that allows for an hourly refresh. This means that the algorithm is able to automatically ingest the newly created data (articles and click statistics) and generate an updated set of suggested articles that would appear

Last but not least, for the purpose of the challenge, the article that has to be predicted should also be from the provided dataset.

Based on the provided data it is important to state that the current solution does not serve as true recommendation engine but rather a prediction engine that is able to correctly identify the next article read by the customer (without knowing what has been suggested if anything thus we consider it entirely chosen by the reader regardless of other content and suggestions available).

To validate the performance of the model an evaluation dataset will be provided that has the data for the next 1 day (currently unavailable in the dataset). The dataset will include only users and articles already observed in the training dataset.

Data Understanding

Net Info has provided data with historical visits of articles per user. The data was split in 21 CSV files, each containing 5 columns:

  • User ID(anonymized)
  • Time – timestamp of when the link was opened by the User
  • URL – link to the article on website
  • Page Title – the title of the article
  • Page Views – number of views accomulated for the period

Data Preparation

The provided data files were concatenated.

The only way to recognize unique articles is to extract the unique ID that is contained in the URL. Otherwise, two articles with the same name could have different URL extension. By Extracting the ID’s, we were able to create a number of statistics for the data. Furthermore, through the URL we have derived two additional variables: Topic and Subtopic.

Page Title was treated by removing all special symbols (like ! , “, _ etc.)

Unique users were identified by a combination of Page Path and UserID. Then the dataset we de-duplicated by this key.


A Hybrid Deep Learning New Recommender System was used to produce the recommendations. The system can be defined as “hybrid” recommendation system as it relies on both content and collaborative methods to extract information from the data. The architecture is described below:

All content based features used within the model are derived prior the training of the neural network. We’ve used 4 groups of such features:

  1. Page Topic features – The page topic is extracted from the pagePATH and is then transformed through a one-hot encoder. We’ve used the 10 most common topics for the analysis as they represent almost all of the articles available in the data, as seen in the graphics provided in section “Data Preparation”.
  2. Page Title features – The page title is supplied for all articles and can be described as a short description in bulgarian of the article contents. We’ve used a tf-idf transformation to transform the narrative format to a numerical vector. However, the resulting vectors, although sparse, are still very large. To account for that we’ve applied a dimensionality reduction technique by using Truncated SVD to select to top 10 components. This is necessary to speed-up the training and evaluation time.
  3. Recency feature – We’ve derived a recency feture, derived as the number of days since the first day the article is seen in the data divided by the number of days available in the data. A simple representation was used due to the limitted time. However given more time we would like to test also the following:
      1. Introduce penalties on the loss function related to the age of the article:
      2. Introduce “Time-Positional Encodings” inspired by the Transformers framework represented by a numerical vector unique for each number of days since publishing the article. More information is avaialble in the referenced paper:
  4. Popularity feature – We’ve derived a popularity feature that represents what proportion of all clicks from the past day are related to the currently analyzed article.

The collaborative filtering algorithm is incorporated within the neural network by applying neural collaborative filtering. This is a procedure that introduces 2 embedding layers – one for the visitors and another one for the articles that are trained by the neural network. The dot product of the 2 encodings is then derived and supplied to the next layers.

The content based features and the output of the neural collborative filtering algorithm are then supplied to a sequence of Dense layers with decreasing number of nodes with final dense layer having only 1 node. Finally, an inception inspired transformation is used that concatenates the output of the final node with the input supplied to the sequence of Dense layers. The concatenated vector is then processed through a sigmoid dense layer with one node to produce the final probability estimate and the log odds used for ranking the articles.


test_results_2 test_results_1

We’ve uploaded the predicted best articles for all customers in the sample.

Used Libraries and Technologies

  • tensorflow==2.2
  • scikit-learn==0.23.0
  • numpy==1.18.4
  • pandas==1.0.3


The chosen architecture is inspired by the following two papers/repos:


Share this

7 thoughts on “ACES solution to article recommender engine case – provided by NetInfo

  1. 0

    Hello, very nice work and nice video too.

    1. It seems that the evaluation results are missing – I can see the code, but not the actual output. How well did your approach do, based on the historical data?
    2. I’m a bit concerned about the negative sampling approach – couldn’t it also penalize possible links between users and articles?
    3. Given that this model goes into ‘production’ and now your client asks you to improve the model accuracy even further, and bring more clicks. Which directions would you take to improve it?

    1. 0

      Hi, Thank you for the feedback!
      1. The accuracy estimated for 12th May is 39%. You can find the final results for 13th May here – . I can also upload the results for 12th of May if you believe that they will be useful.
      2. I agree. It could penalize non-observed, but possible future links learned through the collaborative filtering algorithm. I’m not worried about the impact on content based features as in their case the flag is “truly” negative. However, about the collaborativ filtering algorithm, to mitigate this issue we’ve used short embeddings for the embedding layers, to generalize as best as possible the behavior. We believe that this generalization of behavior on visitor + article level will mitigate any negative effect on the final ranking – e.g. the probability estimate will be biased by the negative sampling, but the ranking of the articles (which is most important) given appropraite embedding length will not be.
      3. Currently we don’t use the most recently read articles for context, while the literature shows that the most recently read articles are a source of sigificant information about the most likely “next best article” to click. This information is traditionally incorporated through markov chain relationship or through recursive neural networks. However, what I want to test is to supply the last 10-20 context articles and 10-20 negative sampling articles and assess them through a softmax function processed on customer level. This is similar to the approach described in the “chameleon” repositori referenced in our Article. Another possible improvement could come from the “recency” features that we use and the “hot nws” features. We’ve used very basic representation while for “recency” we can use some of the approaches proposed by us above and for “hot news” we can implement features that are dependent on a longer history.

  2. 0

    Hi All,
    We noticed that the attached files in the article are not the correct one. They are only 2, while finally we have 25 files with ~2 MB size. We don’t want to change the article as it will move it back to “DRAFT”, so please refer to the results here: when evaluating the model.

    The accuracy of the model on 12th May is 39%. However, we also used this day for training, so the estiamte might be biased.

    The accuracy is estiamted following the approach discussed in the DSS chat for the case:
    1. First we count the total number of users experiencing article interaction with the user-specific “next best article” proposed by the recommendation system (only one article is allowed per user). E.g. – if a user has clicked on 6 article in the next day we should predict only one of them to count this as “1” regardless of their order in that day.
    2. Then we count the number of visitors present in both train and test that have interacted with an article that is present in the “train” set. E.g. – If a user has clicked on 1 article the next day, but this article was not present in the “train” dataset we ignore it. However, if an user clicks on 25 articles and 20 of them were present in the “train” set then we count this as “1”.

    Finally we use “1.” as numerator and “2.” as denominator to derive the evaluation metric:
    eval_metic = “Count of 1.”/”Count of 2.”

  3. 0

    Nice work and nice video.
    A few things I am curious about:
    * Have you looked at how the two parts of your algorithm behave on their own: I.e. if two titles are deemed close by the algorithm – are they really so to a human?
    * In the same vein to liad’s third question above – say you had the actual articles – how much of a change to your algorithm would this entail?


    1. 0

      Hi, Thank you for the feedback!

      1) If you are refering to the tf-idf + truncated svd representation of the titles. We checked several cases and indeed similar vectors (by cosine similarity) are simialar for a human. As for testing the algorithm using only “Content Based Features” or only “Collaborative filtering”. We didn’t have time to check this, but some preliminary tests using only Neural Collaborative Filtering weren’t as good, so my intuition is that the main added value is currently comming from the “Content Based Features” and the Neural Network transformations afterwards + Negative Sampling

      2) I would refrain from introducing the text of the articles for the time being, as in my mind other sources of data as the N most recent articles or the recency and hot news definitions would result in much higher uplift. Our working assumption is that the Title of the article is sufficient summary of the article contents, but we haven’t done any alaysis to validate that.

  4. 0

    Hi, all,
    Great video – figures & explanations! And the model architecture looks logical and appropriate for the goal.
    About the introduction of negative values, maybe it would be useful first to build quick & dirty model in order to isolate a set of less likely events and then to assign negative value to them. Of course, this is again a kind of speculation, but I think is better than the random approach.
    It is a pity that we can’t see the results easily…

    1. 0

      Hi, Thank you for the feedback and for the suggestion!
      I agree that a more target approach for creating the negative sample will lead to better results. In this case we just settled for the random approach as it was easy to implement and was inspired by the chameleon repo referenced in the article.

Leave a Reply