### Datathon – Sofia Air 2.0 – Solution – Team Chameleons

Thanks for the feedbacl. In regards to the state of construction, I think this was taken care in the provided dataset. Every construction type has a default length which they pollute the air for. For example, “small housing” is assumed to affect the air in the next n months after starting. We were told that all examples which exceeded this length in comparison to our sample period were filtered out from the construction sites dataset.

Thanks for the feedback. I completely aggree in regards to the color scale.

### Datathon Sofia Air Solution – Telelink Case Solution

This is regarding the task to validate if the citizens data are valid and trusted. So, we have decoded the geohashes and ended up with the locations of the citizens’ stations. After that we calculated the distances between a station and all official stations. We grouped by datee and station and checked if the mean measure for a day at a particular station is 3 times bigger than the official mean measure. If it is, then we assumed that there is a measurement error and we replace this value with the official measurement value. Here, the cutoff is somewhat subjective. We were thinking to compare with the 3 standard deviations intervals but aproach will not be appropriate as we are not working with a normal distribution (we did not run a formal tests, but the data cannot be with a value below 0, so another distribution should be better, probably gamma). So this constant, 3, is just arbitrary.