In an ever increasing paced world we want everything to happen fast and easy. Our attention span and our frustration fuse are both getting shorter. Nowadays we kind of expect from web services to know us intimately and to know our desires without us telling them. This is where recommendation systems kick in. They save us time and makes us feel like a web service has exactly what we need. Thus in time when users have access to millions of news sources it is vital to navigate the user to the ones most interesting to him.
The task is to create a model that can recommend the next best article for a user. Here besides taking account of the user reading history, one can also take account for the user history of each article thus learning from the history of people with similar interests.
The task that participants should try to solve is at first glance is easy – to produce an algorithm that, based on articles already visited by one user on vesti.bg, will automatically / real-time recommend the next most interesting article for him in this site.
For this purpose, each team will have very easy-to-read data. For example URL, ID, Timestamp, but a very large volume – like two to three million rows.
Each model will be benchmarked against real data from the next day regarding recommending the next best article. In addition bonus points would be given to teams using Tensorflow.
Net Info has provided data with historical visits of articles per user. The data consists of user ID, URL, timestamp, article title and article views.
- Page Title
- Page Views
You can access the full dataset here:
1. Do we need to recommend an article or just a topic?
- The main idea of the case is to predict the next best article (not topic) for the visitor. The article should be from the train dataset. We do not recommend parsing the text of all articles. The model will be to much resource consuming. We need fast working model. We need to refresh it on hourly base for example.
2. Would the evaluation be on the same users from train with their next article read from website, or on another bunch of users?
- The evaluation of the model will be for the same users using the data for the next time period. The training dataset is for the 30 days. Evaluation dataset is for the next 1 day.
3. Would you provide the template of output format
- We do not have specific format for the output. It will be too limited for all the possible solutions. I expect something like VisitorID, first_best_article, second_best_article for the next 1 day/hour/minute
4. If someone is using Google Colab this link for the dataset will be useful: colab