Popular comments by jeremy

Monthly Challenge – Sofia Air – Solution – Jeremy Desir Weber

Thank you for this relevant question, I also compared official meteo data to citizen ones and found a similar discrepancy on pressure. After a more careful reading of the meteo doc, it was due to the sea-level adjusted pressure they provided. Once you perform this transformation to take into account elevation and temperature in the pressure derivation, the adjusted version nicely belongs to the extremum from official data. This was not obvious at all, and I think those measurement are now properly comparable, especially on two different elevation levels (see update notebook)

Monthly Challenge – Sofia Air – Solution – Jeremy Desir Weber

Thanks for your message 🙂 You are right I might have filtered PM10 measurement in regard with official stations but I was not sure whether these extremum values were outliers to remove or carried critical information we should keep to properly reflect the most important situations we wish to forecast i.e. rare PM concentration peaks. Secondly, averaging the data per hour for citizen stations in a given geo-unit (or cluster) could synthesize information and smooth the signal. Do not know yet how relevant it is to be honest, but assuming this makes sense, DBA Averaging tackles the updating step in K-Means more appropriately than Euclidean metric for time series.

Data Exploration, Observations, Planning

Hello Kams, thanks for sharing these extensive progresses of your! Your initial analysis is amazingly well detailed. I’ll mainly focus on your latest update : have you tried simple methods as a benchmark? What common metric would you use to assess on prediction’s performance? I am deeply intrigued by your use of such sophisticated methods, RF and LSTM, the later being particularly interesting for time series modeling. Surprising to have such constant predictions from RF (might be an hyper parameters issue). Are you sure the NN’s loss converged properly ? Your loss shape looks like an under fitting. Best

Monthly Challenge – Sofia Air – Solution – Kiwi Team

Hello Kiwi, congratulations for this relevant work so far! Particularly appreciate the data viz and explanations surrounding your code. From my understanding , you removed 2017 stations not present in 2018? This is indeed interesting to avoid dropping most of 2018 in the opposite was done (removing 2018 stations not present in 2017); still hesitating what’s best on this point. Not sure to understand what your localizeErrors function does, and how to interpret subsequent figures. Looking forward to your updates for week 3! Best

The pumpkins

Hello pumpkins, congratulations for great work so far. Nice to try the alternative clustering method through k-means, particularly appreciated the map provided in doc file. My concerns are the following : 1/ how did you end up with 417 citizen stations out of 1265 ? Personally, it is 372 common stations in 2017-2018 out of 383 in 2017 and 1253 in 2018. 2/ Could you please add visualization, data set header and/or some explanation around each step your code as it is hard to follow your process conveniently ? Best