Datathon 2020 SolutionsDatathons Solutions

Weather-proof Mobility

2
votes

7 thoughts on “Weather-proof Mobility

  1. 0
    votes

    A good approach to the topic- Nice and clean definition of business understanding. Data analysis performed with correct focus and omitting unnecessary segments which would complicate the solution. We are missing the final model/conclusion of what can be observed and used as an outcome of the whole “project”.
    Also, I would suggest focussing on a shorter time-series segment (daily patterns) to identify peak/offpeak periods and how they interact with TBB in this case together with weather conditions. We all know that bus driers should be professionals but the majority of “normal” non-bus driers are not and they are heavily impacted in distracting sensor inputs (thunderstorm, rain, people cutting in, or even forgetting how to drive when weather condition changes).

    1. 0
      votes

      Also, I would advise using some additional datasets which were not part of the initial dataset, like aggregated daily traffic estimates on an hourly basis provided by some navigation applications because that can additionally help with model precision. – I’m adding my last sentence about additional dataset to all teams focusing on this problem because no one did even consider it and that is something you can always do on any project – focus not on internal/provided data but find something to augment it πŸ˜‰

  2. 0
    votes

    Hi, romina & bogomil
    That’s great that you played with the data about public transport… And the documentation is well structured.
    Here are my comments about the periodic component in TBB variables. You could take into account this “seasonal” component – e.g. model it, then remove it from data, then account for the weather effect, next make forecast and then reintroduce the periodic component back. The reason for this split of periodic & non-periodic parts (and using different models) is the different reasons (factors) for their presence in the data…
    Another approach is to add more variables (day of the week), attempting to represent the periodicity…

  3. 0
    votes

    Nice, clear approach: good work.

    That said, have you looked at adding interaction terms in the regression (i.e. weather * weekday)? This might necessitate changing the resolution of the data from daily to half-day or even less.
    Also, it might benefit the analysis to look into quantile regression.

    Best regards,

    1. 0
      votes

      Thanks for the comment!

      We did explore some models with interaction terms, as we could imagine that different social behavior on weekends vs weekdays could change the way weather influences traffic. From the models that we tried out, we did not discover an interaction effect worth publishing. Indeed, increasing the granularity of the model may very well reveal such effects.

  4. 0
    votes

    Hi, again πŸ™‚
    I like very much your work: the considerations related to the data, the interpretation of the outliers, the conclusions and also the good business understanding. πŸ™‚
    I have some comments & questions about the final model: looking at the p-values you put many not significant factors in the model or I misunderstood something. In order to reduce the possibility of overfitting, I would remove some of them in order only significant factors finally to stay. And to check the model for overfitting, we should compare R2, adj.R2, RMSE, etc., both for the train and test samples. In the case of the cross validation you did we also should do this for the average measures of the model quality. Also using linear regression, we impose particular hypothesis about the type of relation between the factors and the dependent. So, it would be good to check other models as well, especially non-parametric ones. Nevertheless, I really like what you have done.

    1. 0
      votes

      Hello Alex, thanks for the detailed feedback!

      You are right about the non-significant variables. Regarding model 3, we have 2 categorical variables with multiple levels. Some levels of both variables have p<0.05, therefore we kept them in the model we evaluated. In models 1 and 2, we kept the summary with all coefficients for transparency, but as you pointed out, they should be omitted for the final interpretation of the model.

      Indeed, we didn't comment on the potential violation of assumptions and alternative modeling options. Non-parametric models, non-linear models and time series regression models are all valid extensions for adequately describing the relationship between weather and traffic disruptions.

Leave a Reply