|Country of origin?||
|For how many years have you been experimenting with data?||
Popular articles by metodinikolov
Popular comments by metodinikolov
In general, the provided code would not give predictions for the required dates:
30.01.2018, 06.02.2018, 20.02.2018, 09.03.2018, 18.03.2018
That said, the code is well documented and seems to be fairly easy to be extended to include the required time period.
It will test for the goodness of the predictions.
A bit more explanatory text what the nnetar function does and how it was used would be beneficial.
The input data used has not been provided, so I am unable to run the code as is.
Here are some of my thoughts:
* i would have like to see more discussion on what the evaluated model says (where possible) – in particular to linear regression, one could observe coefficients and other statistics. That could have said something about the inclusion of `temp`, `temp_min` and `temp_max` in the model – I am a bit worried that given the correlation of almost 1 between these three, the model could be overfitted.
* An analysis on the number and extend of outliers could have benefited the work – it might have given ammunition to exclude those data points, thus freeing the algos to better fit the model – or a robust regression technique could have also worked.
Nice, clear approach: good work.
That said, have you looked at adding interaction terms in the regression (i.e. weather * weekday)? This might necessitate changing the resolution of the data from daily to half-day or even less.
Also, it might benefit the analysis to look into quantile regression.
Nice work and nice video.
A few things I am curious about:
* Have you looked at how the two parts of your algorithm behave on their own: I.e. if two titles are deemed close by the algorithm – are they really so to a human?
* In the same vein to liad’s third question above – say you had the actual articles – how much of a change to your algorithm would this entail?
Well written article and a good video – nice work!
I like the fact that you have devised and implemented you own entire take on the issue. Here are some questions/observations:
* It seems to me that you are making some assumptions about the data when defining you statistics (publication time, etc.) that have the potential to greatly affect the result.
* do you guard against recommending article that someone has already read?
* your rating definition stresses the use of different bases – but that is a multiplicative constant for all ratings (hence little bearing on comparing different values) vs the same base formula (and your code seems to use the same base?) – could you expand this point further?