Country of origin? | Bulgaria |
---|---|

For how many years have you been experimenting with data? | 2 |

## Popular articles by beartooth

### Monthly Challenge – Sofia Air – Solution – Kung Fu Panda

## Popular comments by beartooth

### Monthly Challenge – Sofia Air – Solution – Jeremy Desir Weber

Great work! I learned a lot from your Python code.

However, I have some remarks/questions.

Did you filter the citizen data on P10 measurement quality? There are a lot of observations above the max value from the official stations.

What is the point in averaging the data per hour for several citizen stations? What do we gain with it?

### Monthly Challenge – Sofia Air – Solution – Kiwi Team

Great progress so far! As I am not familiar with R, I cannot say anything about the code. The visualizations look great though 🙂

However, I have some questions:

How did you come up with the limits for the temp, pressure and humidity? What about the errors in the P10 measurements?

What is the point of aggregating the data by geo unit? Why don’t we model the data at each sensor location? What do we gain from the aggregation?

### Monthly Challenge – Sofia Air – Solution – Jeremy Desir Weber

One more thing, how did you come up with the limits for the pressure measurements? When I compared both data sets (official and citizen), I found that 75% of the citizen measurements for pressure is below the min value from the official data.

### Monthly Challenge – Sofia Air – Solution – New!Bees

It is interesting to understand how many observations were removed from the data set based on your criteria. Removed observations will interrupted the time series. How will you deal with this?

How many clusters do you choose to work with?

A map with the clusters would be nice.

### Monthly Challenge – Sofia Air – Solution – Banana

Good that you looked at the consistency of the pressure, temp and humidity, but what about the air pollution measurements?

I don’t think that excluding of observations is a good idea, as this means that the time series will not be continuous. Did you consider replacing the strange values with the mean or some other assumed value?

How did you decide on the number of clusters? Why 10?