Datathons Solutions

Predicting weather disruption of public transport

KINDLY EVALUATE THE PDF.
THE PDF IS MY FINAL SOLUTION.
MY JUPYTER NOTEBOOK WAS REJECTED BECAUSE 2.1MB > 2.0MB

0
votes

7 thoughts on “Predicting weather disruption of public transport

  1. 0
    votes

    A good approach to the topic, but I would suggest some change in focus like:
    1. weather data is not something you could look as lab environment which is separated from the rest of the world and here butterfly effect do exist (change of weather somewhere else will impact weather on this location), so spending time on building weather forecast models (temp, pressure, humidity could be better spent on other areas and weather forecast data could be used from some “official” sites/locations
    2. Daily seasonality is not taken into account because traffic distribution is not equal during the day and there are peak and offpeak periods which also may lead to a skewed analysis of the weather influence. I would suggest using ACF/PACF to find (partial)correlations with smaller lag than using pure ARIMA (this will be OK for longer seasonalities)…
    3. Focus on what happens with traffic in peak/offpeak periods with different weather conditions.
    Step 3 should have been the focus of your effort because that would yield more final value from the model.
    I like how you started, but we are missing last mile/conclusion. We know that time was limited so better organization next time πŸ˜‰

    1. 0
      votes

      Also, I would advise using some additional datasets which were not part of the initial dataset, like aggregated daily traffic estimates on an hourly basis provided by some navigation applications because that can additionally help with model precision. We all know that bus driers should be professionals but the majority of β€œnormal” non-bus driers are not and they are heavily impacted in distracting sensor inputs (thunderstorm, rain, people cutting in, or even forgetting how to drive when weather condition changes). – I’m adding my last sentence about additional dataset to all teams focusing on this problem because no one did even consider it and that is something you can always do on any project – focus not on internal/provided data but find something to augment it πŸ˜‰

        1. 0
          votes

          you did a good job by yourself in so short timespan. I would only advise teaming with others because that way you can learn faster…. not just solve cases by yourself. on one of the first datathons two or three years ago, we got some interesting results that some of the teams which were created randomly had a really good track record and collaboration even when no one knew each other, or maybe despite it. I’m glad that I had a chance to read and see your approach to this problem.

  2. 0
    votes

    Hi, Kabir πŸ™‚ You made a lot of job here!
    I have some comments on it:
    – i like that you take care about the values of dependent variable and used sampling techniques to avoid imbalance, which may cause troubles later
    – also like that you use multistage modelling (label 1, 2, …) and it would be good to write more about this approach.
    – it would be great, if you also used the data about the public transport as well, as the goal is actually related to it…
    – you mentioned that, because you don’t use NN the data is not needed to be standardized. At the same time you use kNN. Keeping in mind that here (in kNN) distances are calculated, it would be useful to normalize the variables in a certain way…
    – the tables showing the models accuracy: It is a bit difficult to compare the models accuracy – i would put in a separate table a single measure per all models, then to make conclusions about the models measure by measure and finally to make final conclusion – just a suggestion… πŸ™‚
    – the plots with forecasts: in order to observe the models accuracy i would forecast the weather variables for the last 7 historical days and then to compare them with their observed values…

    1. 0
      votes

      @alex-efremov, yes, you are right. KNN uses distance metrics and requires normalized features. After I decided not to use NN, I should’ve removed KNN too, I was in a rush. Thanks!

Leave a Reply