Prediction systems

Monthly Challenge – Sofia Air – Solution – Banana


11 thoughts on “Monthly Challenge – Sofia Air – Solution – Banana

  1. 0

    Please, import your code as selectable text, we have great capabilities here on the platform to render jupyter notebook, or at least past it as text in quted field or something…. using images to show code is very lame

  2. 0

    Pumpkin team: We would like first to congratulate you, Banana team, for your first great written article! The first two parts about the business and data understanding are great introduction of the goal of the Monthly challenge and about the main aspects of the issue which we are dealing with – the air pollution in the capital of Bulgaria. We appreciate the simple and clear operations that you have used in your code because it is easy for the others to follow your steps and logic as well as to help those who do not fully understand the functions used in the code. It is obvious that you have given much effort and managed to apply a big variety of R commands. We do not have any remarks for improvements at this stage but only would like to ask about one of the filters you intend to apply – why do you want to filter to the stations which are missing in 2018 and why you checked whether there are some missing both in 2017 and 2018? Our approach, for example, is to look at the latest data (excluding the stations which do not appear in 2018) because in order to make a forecast we want to use the latest information. Maybe you meant the same but we just want to double-check. 🙂

  3. 0

    Great work! From the writing, I infer you are a local team in Sofia. I appreciate your passion for the problem. I loved the fact that you chose to keep all the data. this is so important, IMHO, to not delete data too early.

    I am not familiar with R. However, explanation assumed the reader knows R. So your points about time data seemed specific to your tool set.

    Good work over all.

  4. 0

    Good that you looked at the consistency of the pressure, temp and humidity, but what about the air pollution measurements?
    I don’t think that excluding of observations is a good idea, as this means that the time series will not be continuous. Did you consider replacing the strange values with the mean or some other assumed value?
    How did you decide on the number of clusters? Why 10?

  5. 0

    Hi team,
    You have done excellent job this weekl! No doubt that your work is great. I think all of us, or at least I am going to learn from you. What I like about your article is that it is very easy to read, the code is outstanding between the lines and the graphs represent the results briefly by giving clear idea what is going on – very interactive. Furthermore, I can say you have attention to detail and also have different way of thinking – out of the box.

Leave a Reply