## Popular comments by vrategov

### Datathon Sofia Air Solution – Telelink Case Solution

Hi, there are some typos here.

This is regarding the task to validate if the citizens data are valid and trusted. So, we have decoded the geohashes and ended up with the locations of the citizens’ stations. After that we calculated the distances between a station and all official stations. We grouped by datee and station and checked if the mean measure for a day at a particular station is 3 times bigger than the official mean measure. If it is, then we assumed that there is a measurement error and we replace this value with the official measurement value. Here, the cutoff is somewhat subjective. We were thinking to compare with the 3 standard deviations intervals but aproach will not be appropriate as we are not working with a normal distribution (we did not run a formal tests, but the data cannot be with a value below 0, so another distribution should be better, probably gamma). So this constant, 3, is just arbitrary.