|Country of origin?||
|For how many years have you been experimenting with data?||
Popular articles by paspaldzhiev
Popular comments by paspaldzhiev
I am thoroughly amused by your “radial” approach – occasionally, we don’t need fancy packages & many dependencies, just regular old math does the trick. Go big or go home :)!
Good exploratory analysis so far. If i understand correctly, this is performed on the entire pooled dataset?
Would be good to see some quantified indicators of agreement between official/citizen science stations & for meteo influence on any differences. Also, please do make sure to note how you define “closest to official” – not at trivial issue 🙂
Hi! Good to see your progress, looking forward to seeing the end-result. Some comments in the interim:
– In the 1st table (official stations #observations), note that gaps in Mladost and Orlov most are due to the fact that the latter station is currently out of operation. Mladost is the station that replaced it. Hence, some of the ‘data gaps’ you see are not gaps per-se.
– Good to focus on data quality and gaps – make sure to show this in the final write-up!
– When filling missing values for citizen science stations – there perhaps some stations which lack many measurements. How do you fill missing values there? Or perhaps I am misunderstanding and you are simply filling in the missing timestamps with NaNs?
@all I do not completely understand this part:
Chech if the PM10 from the sensor is no more than 3 times bogger than the official one:
if it falls in the limit – take the data ai valid;
if not – replace with the official stations.
What do you mean by “replace”?
@all would be good to see some plots of e.g. the distribution of the changes before/after. Also, would be good to see comparison of official stations w/ some citizen science neighbours. Some of the differences between stations in space may be due to heterogeneous conditions (different sources & intensity of pollution e.g)?
Good focus on data quality so far and cool use of maps for exploratory analysis. Well-spotted on missing values in the ‘official’ (EEA) dataset!
For data enriching, did you have to deal with cases where there are large chunks of missing hours/days – e.g. how would you fill values missing values between 2PM and 8PM when all values are missing? Does your approach aim to handle this?
Good notes on future improvements to be made. Esp. like the potential use of Google’s traffic data – too often folk focus on reinventing the wheel with primary measurements, when more and more often these days, our benevolent corporate overlords have already done the legwork for us!