Popular articles by vrategov
Popular comments by vrategov
Hello, thanks for your positive feedback.
I found myself often clicking on news because of the title, and we see a lot of news agencies trying to catch the attention of the visitors through it. The title is aimed to give the main point of the article and we emphasise it would give us enough information about the visitor’s interests.
Moreover, we argue that use of article’s text as an input is wrong strategy of model training. Looking at tabloid’s pages, maximal available information about article except from title is it’s subtitle. It means that article’s text is a hidden state, and the only reasonable approach with it’s usage is multi-layer model with hidden variables staying for article’s text and probability of user to be interested in this text. So that taking article’s title as an input, the joint distribution over hidden layers of text and target variable could be fitted – but not pretending a hidden variable to be visible one.
Thanks for the feedbacl. In regards to the state of construction, I think this was taken care in the provided dataset. Every construction type has a default length which they pollute the air for. For example, “small housing” is assumed to affect the air in the next n months after starting. We were told that all examples which exceeded this length in comparison to our sample period were filtered out from the construction sites dataset.
Thanks for the feedback. I completely aggree in regards to the color scale.
Hi, there are some typos here.
This is regarding the task to validate if the citizens data are valid and trusted. So, we have decoded the geohashes and ended up with the locations of the citizens’ stations. After that we calculated the distances between a station and all official stations. We grouped by datee and station and checked if the mean measure for a day at a particular station is 3 times bigger than the official mean measure. If it is, then we assumed that there is a measurement error and we replace this value with the official measurement value. Here, the cutoff is somewhat subjective. We were thinking to compare with the 3 standard deviations intervals but aproach will not be appropriate as we are not working with a normal distribution (we did not run a formal tests, but the data cannot be with a value below 0, so another distribution should be better, probably gamma). So this constant, 3, is just arbitrary.