|For how many years have you been experimenting with data?
Popular articles by strabron
Popular comments by strabron
Hi, Thank you for the feedback and for the suggestion!
I agree that a more target approach for creating the negative sample will lead to better results. In this case we just settled for the random approach as it was easy to implement and was inspired by the chameleon repo referenced in the article.
Hi, Thank you for the feedback!
1) If you are refering to the tf-idf + truncated svd representation of the titles. We checked several cases and indeed similar vectors (by cosine similarity) are simialar for a human. As for testing the algorithm using only “Content Based Features” or only “Collaborative filtering”. We didn’t have time to check this, but some preliminary tests using only Neural Collaborative Filtering weren’t as good, so my intuition is that the main added value is currently comming from the “Content Based Features” and the Neural Network transformations afterwards + Negative Sampling
2) I would refrain from introducing the text of the articles for the time being, as in my mind other sources of data as the N most recent articles or the recency and hot news definitions would result in much higher uplift. Our working assumption is that the Title of the article is sufficient summary of the article contents, but we haven’t done any alaysis to validate that.
We noticed that the attached files in the article are not the correct one. They are only 2, while finally we have 25 files with ~2 MB size. We don’t want to change the article as it will move it back to “DRAFT”, so please refer to the results here: https://drive.google.com/open?id=1pf00CVy8nqKqpziGZioJTDc-rxm_ahlW when evaluating the model.
The accuracy of the model on 12th May is 39%. However, we also used this day for training, so the estiamte might be biased.
The accuracy is estiamted following the approach discussed in the DSS chat for the case:
1. First we count the total number of users experiencing article interaction with the user-specific “next best article” proposed by the recommendation system (only one article is allowed per user). E.g. – if a user has clicked on 6 article in the next day we should predict only one of them to count this as “1” regardless of their order in that day.
2. Then we count the number of visitors present in both train and test that have interacted with an article that is present in the “train” set. E.g. – If a user has clicked on 1 article the next day, but this article was not present in the “train” dataset we ignore it. However, if an user clicks on 25 articles and 20 of them were present in the “train” set then we count this as “1”.
Finally we use “1.” as numerator and “2.” as denominator to derive the evaluation metric:
eval_metic = “Count of 1.”/”Count of 2.”
Hi, Thank you for the feedback!
1. The accuracy estimated for 12th May is 39%. You can find the final results for 13th May here – https://drive.google.com/open?id=1pf00CVy8nqKqpziGZioJTDc-rxm_ahlW . I can also upload the results for 12th of May if you believe that they will be useful.
2. I agree. It could penalize non-observed, but possible future links learned through the collaborative filtering algorithm. I’m not worried about the impact on content based features as in their case the flag is “truly” negative. However, about the collaborativ filtering algorithm, to mitigate this issue we’ve used short embeddings for the embedding layers, to generalize as best as possible the behavior. We believe that this generalization of behavior on visitor + article level will mitigate any negative effect on the final ranking – e.g. the probability estimate will be biased by the negative sampling, but the ranking of the articles (which is most important) given appropraite embedding length will not be.
3. Currently we don’t use the most recently read articles for context, while the literature shows that the most recently read articles are a source of sigificant information about the most likely “next best article” to click. This information is traditionally incorporated through markov chain relationship or through recursive neural networks. However, what I want to test is to supply the last 10-20 context articles and 10-20 negative sampling articles and assess them through a softmax function processed on customer level. This is similar to the approach described in the “chameleon” repositori referenced in our Article. Another possible improvement could come from the “recency” features that we use and the “hot nws” features. We’ve used very basic representation while for “recency” we can use some of the approaches proposed by us above and for “hot news” we can implement features that are dependent on a longer history.
Thanks for the feedback!
1) You can refer to code 05. Calculate FInal Prediction.ipynb to notebook “https://github.com/datasciencesociety/Phoenix/blob/master/!Clean%20Project%20Files/05.%20Calculate%20FInal%20Prediction.ipynb” to see the final asignment of the flags supplied to the leaderboard. We are just searching the space outlined in the xml files for a product. I don’t think that the evaluation approach used in the leaderboard is very good, as the coordinates in the xml file are not standardized and in cases of missing products account for the full available space, while in case of available products account only for the space of the specified product. 🙂 However, as seen in the BoxPlot_data.zip file we are correctly identifying almost all of the objects and 100% of the labels. – https://github.com/datasciencesociety/Phoenix/blob/master/!Clean%20Project%20Files/BoxPlot_Data.zip
2) For performance we only used the leaderboard, as in Computer Vision it really depends on what you are interested in.