CasesDatathon cases

Datathon2020 – Predicting weather disruption of public transport – provided by Ernst and Young


Data Use Case Introduction

Climate change is projected to increase the frequency and intensity of some extreme weather events which is likely to damage transportation infrastructure and cause a disruption in the public transport and increase the risk of delays and failure due to storm, flooding and higher temperatures affecting the reliability and capacity of the public transportation.

The Goal

Can you analyze the weather data to predict public transport service disruption in Dubai? How can we plan for less disruption in the wake of severe weather conditions and leverage the emergency management plan as well as providing uninterrupted services and products to citizens?

About the Case Providers

Ernst & Young, is one of the world’s leading professional services organizations that helps companies across the globe to identify and capitalize on business opportunities. We deliver the value that clients care about; we provide ideas and solutions tailored to meet clients’ needs; and we produce tangible results. Ernst & Young’s depth and breadth of service and our global reach mean that we have the resources to serve any client, anywhere in the world



Research Problem

Public transit is a critical component of a smart and connected community. As such, citizens expect and require accurate information about real-time arrival/departures of transportation assets. Extreme climatic events frequently result in operational delays causing cascading delays to a succession of public transport services. Predictive analytics has the potential to improve public transport by analyzing historical weather data and real time transit data to predict any possible disruption of public transportation & solutions under severe weather conditions including but not limited to services continuity and optimization, developing of new products and in the understanding of people behavior.

The Data

Download weather data in JSON format from the link:

Weather data


  • city_name City name
  • lat Geographical coordinates of the location (latitude)
  • lon Geographical coordinates of the location (longitude)
  • main
    • main.temp Temperature
    • main.feels_like This temperature parameter accounts for the human perception of weather
    • main.pressure Atmospheric pressure (on the sea level), hPa
    • main.humidity Humidity, %
    • main.temp_min Minimum temperature at the moment. This is deviation from temperature that is possible for large cities and megalopolises geographically expanded (use these parameter optionally).
    • main.temp_max Maximum temperature at the moment. This is deviation from temperature that is possible for large cities and megalopolises geographically expanded (use these parameter optionally).
  • wind
    • wind.speed Wind speed. Unit Default: meter/sec
    • wind.deg Wind direction, degrees (meteorological)
  • clouds
    • clouds.all Cloudiness, %
  • rain
    • rain.1h Rain volume for the last hour, mm
    • rain.3h Rain volume for the last 3 hours, mm
  • snow
    • snow.1h Snow volume for the last hour, mm (in liquid state)
    • snow.3h Snow volume for the last 3 hours, mm (in liquid state)
  • weather (more info Weather condition codes)
    • Weather condition id
    • weather.main Group of weather parameters (Rain, Snow, Extreme etc.)
    • weather.description Weather condition within the group
    • weather.icon Weather icon id
  • dt Time of data calculation, unix, UTC
  • dt_isoDate and time in UTC format
  • timezone Shift in seconds from UTC
  • Bus Schedule – Static GTFS Data
  • Real time Transit – Real time GTFS Data

Dubai Pulse:

The Dubai Pulse is the next-generation digital operating system designed to benefit individuals, companies, the private sector and the government. This unique platform will offer city services, City Data (Big Data) Orchestration and Digital Identity, as well as Cloud and IoT capabilities. Smart Dubai’s strategic partners, including Dubai Government entities, may leverage from Dubai Pulse and benefit from the available open and shared data.

GTFS (optional)

The General Transit Feed Specification (GTFS) is a standard data specification that allows public transit agencies to publish their transit data in a format that can be consumed by a wide variety of software applications. Today, the GTFS data format is used by thousands of public transport providers.

GTFS Data Structure can be found at :

The participants can find alternate sources to get the Dubai GTFS data and compliment it with the other provided datasets.

Expected Output

The main focal point for presenting the results from the Datathon from each team is the written article. It would be considered by the jury and it would show how well the team has done the job.

Considering the short amount of time and resources in the world of Big Data Analytics, it is essential to follow a time-tested and many-project-tested methodology CRISP-DM. You could read more at

The organizing team has tried to do the most work on phases “1. Business Understanding” “2. Data Understanding”, while it is expected that the teams would focus more on phases 3, 4 and 5 (i.e. “Data Preparation”, “Modeling” and “Evaluation”), to achieve the best results in phase  5. Evaluation.

Phase “6. Deployment” mostly stays in the hand of the case-study providing companies as we aim at the continuation of the process after the event.

Share this

Leave a Reply