|Country of origin?||
|For how many years have you been experimenting with data?||
Popular articles by beartooth
Popular comments by beartooth
Great work! I learned a lot from your Python code.
However, I have some remarks/questions.
Did you filter the citizen data on P10 measurement quality? There are a lot of observations above the max value from the official stations.
What is the point in averaging the data per hour for several citizen stations? What do we gain with it?
Great progress so far! As I am not familiar with R, I cannot say anything about the code. The visualizations look great though 🙂
However, I have some questions:
How did you come up with the limits for the temp, pressure and humidity? What about the errors in the P10 measurements?
What is the point of aggregating the data by geo unit? Why don’t we model the data at each sensor location? What do we gain from the aggregation?
One more thing, how did you come up with the limits for the pressure measurements? When I compared both data sets (official and citizen), I found that 75% of the citizen measurements for pressure is below the min value from the official data.
It is interesting to understand how many observations were removed from the data set based on your criteria. Removed observations will interrupted the time series. How will you deal with this?
How many clusters do you choose to work with?
A map with the clusters would be nice.
Good that you looked at the consistency of the pressure, temp and humidity, but what about the air pollution measurements?
I don’t think that excluding of observations is a good idea, as this means that the time series will not be continuous. Did you consider replacing the strange values with the mean or some other assumed value?
How did you decide on the number of clusters? Why 10?