Over all good work. I liked the way you presented your approach explanation methodically and commented on the thinking behind your decisions. Like @everyoneishigh, I liked how you plotted Sofia topo data and established a grid and then filtered out the locations.
From business side, I would have liked to see the assumptions you are making about this project. Often, I have found people arrive at a project with different assumptions. Writing yours clearly will give someone an opportunity to disagree or correct them.
Is the data bias that the official data are very reliable a correct one (even though it’s accepted as the eu norm?). Is there a process by which data quality and data gaps are vetted in those stations? Something for you to explore and inform as you look at the AirBG.info mission and more data.
You chose to leave out the data available via API. I also thought your choice of excluding data from 2017 that didn’t have data from 2018 an interesting one. It may be statistically insignificant, but just because citizen data recorders that reported previously didn’t report now, doesn’t mean they reported wrong. That said, given the small amount of this data, it may not be worth the effort and instead use the more recent data from the API.
Over all good stuff – thanks for helping me learn.
Some of the meta data you are looking for is available in a round about way if you look at the official meteorological data. These meta data files include hyperlinks to more granular definitions. T
Thank you for your feedback! We agree that it is very important to explain every step thoroughly and we will try to do so with our next update. Good luck!
Pandas you have done great work! You have very bright ideas and imagination and seems that you realize everything perfect. Especially the efforts on your graphs, such as “All stations in Bulgaria from the “unofficial” datasets”. We think that your further steps will even more blow us all, so keep up the good work 🙂
It is interesting to understand how many observations were removed from the data set based on your criteria. Removed observations will interrupted the time series. How will you deal with this?
How many clusters do you choose to work with?
A map with the clusters would be nice.
Hi, thank you for your feedback 🙂 Removed observations disrupt the time series indeed, that’s why we are going to interpolate the missing observations in week 3. We are also going to change a part of our strategy and use the observations from the official stations instead of the meteorology dataset, but all of this is about to happen this upcoming week. So stay tuned 😉
We can see that the tasks are completed in a very structured and interesting way. The explanations are easy to understand what have you done. Great job! 🙂 -Team Yagoda
15 thoughts on “Monthly Challenge – Sofia Air – Solution – New!Bees”
You should upload the code here in a jupyter notebook
Will do 🙂
Your assignments to peer review (and give feedback below the coresponding articles) for week 1 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/sofia-air-quality-eda-exploratory-data-analysis/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-kiwi-team/
https://www.datasciencesociety.net/sofia-air-week-1/
Brilliant work 🙂
The idea of relative mapping of the co-ordinates of Sofia never occurred to me !
Also. why would it..since I’m just a beginner :p
Thank you!
Over all good work. I liked the way you presented your approach explanation methodically and commented on the thinking behind your decisions. Like @everyoneishigh, I liked how you plotted Sofia topo data and established a grid and then filtered out the locations.
From business side, I would have liked to see the assumptions you are making about this project. Often, I have found people arrive at a project with different assumptions. Writing yours clearly will give someone an opportunity to disagree or correct them.
Is the data bias that the official data are very reliable a correct one (even though it’s accepted as the eu norm?). Is there a process by which data quality and data gaps are vetted in those stations? Something for you to explore and inform as you look at the AirBG.info mission and more data.
You chose to leave out the data available via API. I also thought your choice of excluding data from 2017 that didn’t have data from 2018 an interesting one. It may be statistically insignificant, but just because citizen data recorders that reported previously didn’t report now, doesn’t mean they reported wrong. That said, given the small amount of this data, it may not be worth the effort and instead use the more recent data from the API.
Over all good stuff – thanks for helping me learn.
Some of the meta data you are looking for is available in a round about way if you look at the official meteorological data. These meta data files include hyperlinks to more granular definitions. T
Thank you for your feedback! We agree that it is very important to explain every step thoroughly and we will try to do so with our next update. Good luck!
Pandas you have done great work! You have very bright ideas and imagination and seems that you realize everything perfect. Especially the efforts on your graphs, such as “All stations in Bulgaria from the “unofficial” datasets”. We think that your further steps will even more blow us all, so keep up the good work 🙂
Banana team
Thank you 🙂
Your assignments to peer review (and give feedback below the coresponding articles) for week 2 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-tomunichandback/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-sky_data/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-banana/
It is interesting to understand how many observations were removed from the data set based on your criteria. Removed observations will interrupted the time series. How will you deal with this?
How many clusters do you choose to work with?
A map with the clusters would be nice.
Hi, thank you for your feedback 🙂 Removed observations disrupt the time series indeed, that’s why we are going to interpolate the missing observations in week 3. We are also going to change a part of our strategy and use the observations from the official stations instead of the meteorology dataset, but all of this is about to happen this upcoming week. So stay tuned 😉
Your assignments to peer review (and give feedback below the coresponding articles) for week 3 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-lone-fighter/
https://www.datasciencesociety.net/air-quality-week-1/
https://www.datasciencesociety.net/air-sofia-pollution-case/
Your assignments to peer review (and give feedback below the coresponding articles) for week 4 of the Monthly challenge are the following teams:
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-kung-fu-panda/
https://www.datasciencesociety.net/monthly-challenge-sofia-air-solution-jeremy-desir-weber/
https://www.datasciencesociety.net/the-pumpkins/
We can see that the tasks are completed in a very structured and interesting way. The explanations are easy to understand what have you done. Great job! 🙂 -Team Yagoda