14 thoughts on “Monthly Challenge – Sofia Air – Solution – Jeremy Desir Weber

    1. 0
      votes

      Thank you for reading. I do not have the same Ln numbers but I think you’re mentioning the warnings associated to the distribution plots. Still wondering how to fix it ^^

    1. 0
      votes

      Thank you very much for your input martin ! I’ll have a look and see how sensitive the distance computation method is relatively to clustering. Also, I hope you’re ok with my use of a bit of your code regarding the world map data viz part. It was well designed and I wanted a clean/quick solution but will adapt it asap to fit my purposes. Best regards !

  1. 1
    votes

    Great work! I learned a lot from your Python code.
    However, I have some remarks/questions.
    Did you filter the citizen data on P10 measurement quality? There are a lot of observations above the max value from the official stations.
    What is the point in averaging the data per hour for several citizen stations? What do we gain with it?

    1. 0
      votes

      Thanks for your message 🙂 You are right I might have filtered PM10 measurement in regard with official stations but I was not sure whether these extremum values were outliers to remove or carried critical information we should keep to properly reflect the most important situations we wish to forecast i.e. rare PM concentration peaks. Secondly, averaging the data per hour for citizen stations in a given geo-unit (or cluster) could synthesize information and smooth the signal. Do not know yet how relevant it is to be honest, but assuming this makes sense, DBA Averaging tackles the updating step in K-Means more appropriately than Euclidean metric for time series.

  2. 0
    votes

    One more thing, how did you come up with the limits for the pressure measurements? When I compared both data sets (official and citizen), I found that 75% of the citizen measurements for pressure is below the min value from the official data.

    1. 0
      votes

      Thank you for this relevant question, I also compared official meteo data to citizen ones and found a similar discrepancy on pressure. After a more careful reading of the meteo doc, it was due to the sea-level adjusted pressure they provided. Once you perform this transformation to take into account elevation and temperature in the pressure derivation, the adjusted version nicely belongs to the extremum from official data. This was not obvious at all, and I think those measurement are now properly comparable, especially on two different elevation levels (see update notebook)

Leave a Reply