Datathon2020 – Predicting weather disruption of public transport – provided by Ernst and Young¶
This Project was inspired from the Business Case of Data Science Society Global 2020 Hackathon hosted from May 15 - 17 , 2020
click here for details about the Business Case and the data dictionary
Data Sources :¶
The datasets used in this project was provided by the organizers, however the external data sourced for were obtained here
The analysis for this project will follow the CRISP-DM pipeline which are ;¶
- Business Understanding
- Data Understanding
- Data Preparation
- Data Modelling
- Results
- Deployment - Storytelling
Business Understanding¶
The summary of the project is to predict public transport service disruption in Dubai using the weather data analysis
- Goal : Can you analyze the weather data to predict public transport service disruption in Dubai? How can we plan for less disruption in the wake of severe weather conditions and leverage the emergency management plan as well as providing uninterrupted services and products to citizens?
Data Understanding and Data Preprocessing¶
This stage involves loading the data and performing necessary data cleaning, preprocessing and feature engineering on the data to prepare it for analysis and modelling
- Importing Necessary Libraries
In [56]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
plt.style.use('ggplot')
import plotly.graph_objects as go
- Loading the datasets into a dataframe
In [939]:
data = pd.read_json('Dubai+Weather_20180101_20200316.txt')
transport = pd.DataFrame(data=None,columns=['year','month','transport_type','trips'])
for i in os.listdir('Transport'):
month_data = pd.read_csv("Transport/" + i)
transport = pd.concat([transport,month_data],axis=0)
In [940]:
data.shape
Out[940]:
In [941]:
data.tail(3)
Out[941]:
- Data Preprocessing and Data Cleaning
In [942]:
transport.info()
In [943]:
transport.reset_index(inplace=True)
transport.drop('index',axis=1,inplace=True)
In [944]:
transport.head()
Out[944]:
Transforming the date to pandas date format
Dropping columns with constant labels such as city_name
and timezone
In [945]:
data.drop(['city_name','timezone','dt_iso'],axis=1,inplace=True)
In [946]:
def convert_time(timestamp):
return datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
In [947]:
data['dt'] = data['dt'].apply(convert_time)
data['dt'] = pd.to_datetime(data['dt'])
In [948]:
data.head(2)
Out[948]:
In [949]:
transport.head(2)
Out[949]:
Feature Engineering¶
- Using the date column created to engineer new date time features such as
Month
and Year
In [950]:
data['month'] = data['dt'].dt.month
data['year'] = data['dt'].dt.year
data['weekdays'] = data['dt'].dt.weekday
In [951]:
data.head()
Out[951]:
- Creating the id in the
data
to be used to map the Transport
data using the Create id function
- Transforming the
Main
, Wind
, Clouds
, weather
and rain
columns to extract the details into a proper format to be used for analysis
In [952]:
data['main'].iloc[0]
Out[952]:
In [953]:
main = data['main'].astype(str).str.strip('{}').str.split(', ', expand=True)
wind = data['wind'].astype(str).str.strip('{}').str.split(', ', expand=True)
weather = data['weather'].astype(str).str.strip('{}').str.split(', ', expand=True)
- Renaming the Columns
In [954]:
main.columns = ['main_temp','Temp_min','Temp_max','Feels_like','Pressure','Humidity']
wind.columns = ['Speed','Deg']
weather.columns = ['id','Main','Description','icon','5','6','7','8']
weather.drop(['5','6','7','8','id'],axis=1,inplace=True)
In [955]:
data.head()
Out[955]:
In [956]:
data.drop(['main','wind','weather'],axis=1,inplace=True)
data = pd.concat([data,main,wind,weather],axis=1)
In [957]:
data.head(1)
Out[957]:
In [958]:
data['clouds'] = data['clouds'].astype(str).str.strip('{}').apply(lambda x:x.split(": ")[-1])
In [960]:
def replace_nan(data):
if pd.isna(data):
return 0
else:
return 1
In [961]:
data.rain = data.rain.apply(replace_nan)
In [962]:
data.rain.value_counts()
Out[962]:
In [963]:
data.head(1)
Out[963]:
In [964]:
cols1 = ['main_temp',
'Feels_like',
'Speed']
cols2 = ['Temp_min','Temp_max','Pressure','Humidity','Deg']
for column in cols1:
data[column] = data[column].str.extract(r'(\d+\.\d+)',expand=False)
for column in cols2:
data[column] = data[column].str.extract(r'(\d+)',expand=False)
In [965]:
data[data['main_temp'].isnull()].head(2)
Out[965]:
In [966]:
data['main_temp'].fillna(0,inplace=True)
In [968]:
data.head()
Out[968]:
In [969]:
def temp_aver(temp):
temp_min = temp[1]
temp_max = temp[2]
main_temp = temp[0]
if temp_min ==0:
return (temp_min + temp_max)/2
else:
return (temp_min + temp_max+main_temp)/3
In [971]:
data[['main_temp','Temp_min','Temp_max']] = data[['main_temp','Temp_min','Temp_max']].astype('float')
data['temp_average'] =data[['main_temp','Temp_min','Temp_max']].apply(temp_aver,axis=1)
In [973]:
data.drop(['main_temp','Temp_min','Temp_max','dt','icon','Description'],axis=1,inplace=True)
In [974]:
data.head(2)
Out[974]:
In [976]:
data['Main'] = data['Main'].str.replace("'main':"," ")
In [977]:
data.head()
Out[977]:
In [978]:
data.Main.value_counts()
Out[978]:
In [981]:
data.temp_average
Out[981]:
In [989]:
#Checking for missing information
data.isnull().mean()*100
Out[989]:
In [990]:
data.info()
- Filling missing information, since both features with missing information have less than
5%
missing values, we will fill with the mean
In [991]:
data['Speed'] = data['Speed'].astype('float')
data['Feels_like'] = data['Feels_like'].astype('float')
data['Speed'].fillna(data['Speed'].mean(),inplace=True)
data['Feels_like'].fillna(data['Feels_like'].mean(),inplace=True)
In [992]:
data.isnull().sum()
Out[992]:
Exploratory Data Analysis¶
After cleaning the data and preparing it, Exploratory Data Analysis (EDA) will be performed to gather insights that will be useful for the model to learn from the data to help improve performance of the model
In [993]:
data.head()
Out[993]:
In [ ]:
Datathon2020 – Predicting weather disruption of public transport – provided by Ernst and Young¶
This Project was inspired from the Business Case of Data Science Society Global 2020 Hackathon hosted from May 15 - 17 , 2020
click here for details about the Business Case and the data dictionary
Data Sources :¶
The datasets used in this project was provided by the organizers, however the external data sourced for were obtained here
The analysis for this project will follow the CRISP-DM pipeline which are ;¶
- Business Understanding
- Data Understanding
- Data Preparation
- Data Modelling
- Results
- Deployment - Storytelling
Business Understanding¶
The summary of the project is to predict public transport service disruption in Dubai using the weather data analysis
- Goal : Can you analyze the weather data to predict public transport service disruption in Dubai? How can we plan for less disruption in the wake of severe weather conditions and leverage the emergency management plan as well as providing uninterrupted services and products to citizens?
Data Understanding and Data Preprocessing¶
This stage involves loading the data and performing necessary data cleaning, preprocessing and feature engineering on the data to prepare it for analysis and modelling
- Importing Necessary Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
plt.style.use('ggplot')
import plotly.graph_objects as go
- Loading the datasets into a dataframe
data = pd.read_json('Dubai+Weather_20180101_20200316.txt')
transport = pd.DataFrame(data=None,columns=['year','month','transport_type','trips'])
for i in os.listdir('Transport'):
month_data = pd.read_csv("Transport/" + i)
transport = pd.concat([transport,month_data],axis=0)
data.shape
data.tail(3)
- Data Preprocessing and Data Cleaning
transport.info()
transport.reset_index(inplace=True)
transport.drop('index',axis=1,inplace=True)
transport.head()
Transforming the date to pandas date format
Dropping columns with constant labels such as
city_name
andtimezone
data.drop(['city_name','timezone','dt_iso'],axis=1,inplace=True)
def convert_time(timestamp):
return datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
data['dt'] = data['dt'].apply(convert_time)
data['dt'] = pd.to_datetime(data['dt'])
data.head(2)
transport.head(2)
Feature Engineering¶
- Using the date column created to engineer new date time features such as
Month
andYear
data['month'] = data['dt'].dt.month
data['year'] = data['dt'].dt.year
data['weekdays'] = data['dt'].dt.weekday
data.head()
- Creating the id in the
data
to be used to map theTransport
data using theCreate id function
- Transforming the
Main
,Wind
,Clouds
,weather
andrain
columns to extract the details into a proper format to be used for analysis
data['main'].iloc[0]
main = data['main'].astype(str).str.strip('{}').str.split(', ', expand=True)
wind = data['wind'].astype(str).str.strip('{}').str.split(', ', expand=True)
weather = data['weather'].astype(str).str.strip('{}').str.split(', ', expand=True)
- Renaming the Columns
main.columns = ['main_temp','Temp_min','Temp_max','Feels_like','Pressure','Humidity']
wind.columns = ['Speed','Deg']
weather.columns = ['id','Main','Description','icon','5','6','7','8']
weather.drop(['5','6','7','8','id'],axis=1,inplace=True)
data.head()
data.drop(['main','wind','weather'],axis=1,inplace=True)
data = pd.concat([data,main,wind,weather],axis=1)
data.head(1)
data['clouds'] = data['clouds'].astype(str).str.strip('{}').apply(lambda x:x.split(": ")[-1])
def replace_nan(data):
if pd.isna(data):
return 0
else:
return 1
data.rain = data.rain.apply(replace_nan)
data.rain.value_counts()
data.head(1)
cols1 = ['main_temp',
'Feels_like',
'Speed']
cols2 = ['Temp_min','Temp_max','Pressure','Humidity','Deg']
for column in cols1:
data[column] = data[column].str.extract(r'(\d+\.\d+)',expand=False)
for column in cols2:
data[column] = data[column].str.extract(r'(\d+)',expand=False)
data[data['main_temp'].isnull()].head(2)
data['main_temp'].fillna(0,inplace=True)
data.head()
def temp_aver(temp):
temp_min = temp[1]
temp_max = temp[2]
main_temp = temp[0]
if temp_min ==0:
return (temp_min + temp_max)/2
else:
return (temp_min + temp_max+main_temp)/3
data[['main_temp','Temp_min','Temp_max']] = data[['main_temp','Temp_min','Temp_max']].astype('float')
data['temp_average'] =data[['main_temp','Temp_min','Temp_max']].apply(temp_aver,axis=1)
data.drop(['main_temp','Temp_min','Temp_max','dt','icon','Description'],axis=1,inplace=True)
data.head(2)
data['Main'] = data['Main'].str.replace("'main':"," ")
data.head()
data.Main.value_counts()
data.temp_average
#Checking for missing information
data.isnull().mean()*100
data.info()
- Filling missing information, since both features with missing information have less than
5%
missing values, we will fill with the mean
data['Speed'] = data['Speed'].astype('float')
data['Feels_like'] = data['Feels_like'].astype('float')
data['Speed'].fillna(data['Speed'].mean(),inplace=True)
data['Feels_like'].fillna(data['Feels_like'].mean(),inplace=True)
data.isnull().sum()
Exploratory Data Analysis¶
After cleaning the data and preparing it, Exploratory Data Analysis (EDA) will be performed to gather insights that will be useful for the model to learn from the data to help improve performance of the model
data.head()
- Business Understanding
- Data Understanding
- Data Preparation
- Data Modelling
- Results
- Deployment – Storytelling
Datathon2020 – Predicting weather disruption of public transport – provided by Ernst and Young¶
This Project was inspired from the Business Case of Data Science Society Global 2020 Hackathon hosted from May 15 - 17 , 2020
click here for details about the Business Case and the data dictionary
Data Sources :¶
The datasets used in this project was provided by the organizers, however the external data sourced for were obtained here
The analysis for this project will follow the CRISP-DM pipeline which are ;¶
- Business Understanding
- Data Understanding
- Data Preparation
- Data Modelling
- Results
- Deployment - Storytelling
Business Understanding¶
The summary of the project is to predict public transport service disruption in Dubai using the weather data analysis
- Goal : Can you analyze the weather data to predict public transport service disruption in Dubai? How can we plan for less disruption in the wake of severe weather conditions and leverage the emergency management plan as well as providing uninterrupted services and products to citizens?
Data Understanding and Data Preprocessing¶
This stage involves loading the data and performing necessary data cleaning, preprocessing and feature engineering on the data to prepare it for analysis and modelling
- Importing Necessary Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
plt.style.use('ggplot')
import plotly.graph_objects as go
- Loading the datasets into a dataframe
data = pd.read_json('Dubai+Weather_20180101_20200316.txt')
transport = pd.DataFrame(data=None,columns=['year','month','transport_type','trips'])
for i in os.listdir('Transport'):
month_data = pd.read_csv("Transport/" + i)
transport = pd.concat([transport,month_data],axis=0)
data.shape
data.tail(3)
- Data Preprocessing and Data Cleaning
transport.info()
transport.reset_index(inplace=True)
transport.drop('index',axis=1,inplace=True)
transport.head()
Transforming the date to pandas date format
Dropping columns with constant labels such as
city_name
andtimezone
data.drop(['city_name','timezone','dt_iso'],axis=1,inplace=True)
def convert_time(timestamp):
return datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
data['dt'] = data['dt'].apply(convert_time)
data['dt'] = pd.to_datetime(data['dt'])
data.head(2)
transport.head(2)
Feature Engineering¶
- Using the date column created to engineer new date time features such as
Month
andYear
data['month'] = data['dt'].dt.month
data['year'] = data['dt'].dt.year
data['weekdays'] = data['dt'].dt.weekday
data.head()
- Creating the id in the
data
to be used to map theTransport
data using theCreate id function
- Transforming the
Main
,Wind
,Clouds
,weather
andrain
columns to extract the details into a proper format to be used for analysis
data['main'].iloc[0]
main = data['main'].astype(str).str.strip('{}').str.split(', ', expand=True)
wind = data['wind'].astype(str).str.strip('{}').str.split(', ', expand=True)
weather = data['weather'].astype(str).str.strip('{}').str.split(', ', expand=True)
- Renaming the Columns
main.columns = ['main_temp','Temp_min','Temp_max','Feels_like','Pressure','Humidity']
wind.columns = ['Speed','Deg']
weather.columns = ['id','Main','Description','icon','5','6','7','8']
weather.drop(['5','6','7','8','id'],axis=1,inplace=True)
data.head()
data.drop(['main','wind','weather'],axis=1,inplace=True)
data = pd.concat([data,main,wind,weather],axis=1)
data.head(1)
data['clouds'] = data['clouds'].astype(str).str.strip('{}').apply(lambda x:x.split(": ")[-1])
def replace_nan(data):
if pd.isna(data):
return 0
else:
return 1
data.rain = data.rain.apply(replace_nan)
data.rain.value_counts()
data.head(1)
cols1 = ['main_temp',
'Feels_like',
'Speed']
cols2 = ['Temp_min','Temp_max','Pressure','Humidity','Deg']
for column in cols1:
data[column] = data[column].str.extract(r'(\d+\.\d+)',expand=False)
for column in cols2:
data[column] = data[column].str.extract(r'(\d+)',expand=False)
data[data['main_temp'].isnull()].head(2)
data['main_temp'].fillna(0,inplace=True)
data.head()
def temp_aver(temp):
temp_min = temp[1]
temp_max = temp[2]
main_temp = temp[0]
if temp_min ==0:
return (temp_min + temp_max)/2
else:
return (temp_min + temp_max+main_temp)/3
data[['main_temp','Temp_min','Temp_max']] = data[['main_temp','Temp_min','Temp_max']].astype('float')
data['temp_average'] =data[['main_temp','Temp_min','Temp_max']].apply(temp_aver,axis=1)
data.drop(['main_temp','Temp_min','Temp_max','dt','icon','Description'],axis=1,inplace=True)
data.head(2)
data['Main'] = data['Main'].str.replace("'main':"," ")
data.head()
data.Main.value_counts()
data.temp_average
#Checking for missing information
data.isnull().mean()*100
data.info()
- Filling missing information, since both features with missing information have less than
5%
missing values, we will fill with the mean
data['Speed'] = data['Speed'].astype('float')
data['Feels_like'] = data['Feels_like'].astype('float')
data['Speed'].fillna(data['Speed'].mean(),inplace=True)
data['Feels_like'].fillna(data['Feels_like'].mean(),inplace=True)
data.isnull().sum()
Exploratory Data Analysis¶
After cleaning the data and preparing it, Exploratory Data Analysis (EDA) will be performed to gather insights that will be useful for the model to learn from the data to help improve performance of the model
data.head()
3 thoughts on “Predicting weather disruption of public transport – provided by Ernst and Young”
☑️ ☑️DO YOU WANT TO RECOVER YOUR LOST FUNDS ON CAPITAL INVESTMENTS, BITCOIN INVESTMENTS, BINARY OPTIONS AND OTHER SCAM TRADING INVESTMENTS ??? TAKE YOUR TIME TO READ🔘
☑️ The COMPOSITE CYBER SECURITY SPECIALISTS have received numerous complaints of fraud associated with websites that offers opportunities for Capital Investments, bitcoin investments and Trading on an Internet-based trading platforms. Most Of The complaints falls into these Two categories:
1. 🔘Refusal to credit customers accounts or reimburse funds to customers:
These complaints typically involve customers who have deposited money into their trading or investment account and who are then encouraged by “brokers” over the telephone to deposit additional funds into the customer account. When customers later attempt to withdraw their original deposit or the return they have been promised, the trading platforms allegedly cancels customers’ withdrawal requests, refuse to credit their accounts, or ignore their telephone calls and emails.
2. 🔘Manipulation of software to generate losing trades:
These complaints alleged that the Internet-based Investment and trading platforms manipulate the trading software to distort the Trading prices and payouts in order to ensure that the trade results in a Loss. For example, when a customer’s trade is “winning,” the countdown to expiration is extended arbitrarily until the trade becomes a loss.
☑️ Most people have lost their hard earned money through Scam Investments And Trading, yet they would go and meet fake recovery Experts unknowingly to help them recover their money and they would end up losing more money in the process. This Is Basically why we (COMPOSITE CYBER SECURITY SPECIALISTS) have come to y’all victim’s rescue. The clue is most of these Brokers have weak Database security, and their vulnerabilities can be exploited easily with the Help of our Special HackTools, Root HackTools And Technical Hacking Strategies because they wouldn’t wanna conduct Bug bounty Programs which would have helped strengthen and protect their Database from Unauthorized accesses, So all our specialists do is to hack into the Broker’s Database Using Technical Hacking Methods and Strategies, Decrypt your Transaction Details, Trace the Routes of your Deposited Funds, Then some Technical Hacking Algorithms & Execution follows then you have your money recovered In Bitcoins. 💰 ✔️
☑️All our Specialists are well experienced in their various niches with Great Skills, Technical Hacking Strategies And Positive Online Reputations And Recommendations🔘
They hail from a proven track record and have cracked even the toughest of barriers to intrude and capture all relevant data needed by our Clients.
We have Digital Forensic Specialists, Certified Ethical Hackers, Software Engineers, Cyber Security Experts, Private investigators and more. Our Goal is to make your digital life secure, safe and hassle free by Linking you Up With these great Professionals such as JACK CABLE, ARNE SWINNEN, SEAN MELIA, DAWID CZAGAN, BEN SADEGHIPOUR And More. These Professionals are Well Reserved Professionals who are always ready to Handle your job with great energy and swift response so that your problems can be solved very quickly.
All You Need to Do is to send us a mail and we’ll Assign any of these specialists to Handle your Job immediately.
☑️ CONTACT:
••• Email:
[email protected]
🔘2020 © composite cybersecurity specialists
🔘Want faster service? Contact us!
🔘All Rights Reserved ®️
INTERNET SCAM ALERT‼️
The internet today is full of SCAM ADS, mostly in comments of various sites and blogs. A large number of individuals have been victims of scam and lost a lot of money to SCAMMERS. Most of the common scam you can see
❌BANK LOAN SCAM. ❌BINARY OPTIONS SCAM.
❌LOTTERY SCAM. ❌HACKING SCAM. and lost more……
✳️The big Question is “Can someone Recover their money lost to Internet Scam⁉️
I will say yes, and will tell you how.
The only way to Recovery your is to hire a Hacker who will help you take back your money from this Scammers and today is your lucky day, you just met the guys perfect for the job.
WHO ARE WE❔
We are PYTHONAX, A group of skilled Hackers and have dedicated our time to help individuals get back thier money from INTERNET SCAMMERS. There is a research was carried one to calculate the amount of money lost through Scam, and it was confirmed that more than USD $3billion annually.
HOW DO WE OPERATE❔
We first of all study the scammer brought to us by hacking the person device(phone or computer) to get information of How, Where, this person keeps money he/she as defrauded people of ( so many of this scammers don’t actually save the money in banks, they mostly stack the money in a Bitcoin wallet, that way it is safe and untraceable to authorities) and we work on a strategy to get back the money and give it back to whom they have defrauded.
Contacting us is simple, just give us a message through the email below.
Email-: [email protected]
If you a victim of internet scam or you know someone who is, make contact to us immediately. You are 💯 % safe to contact us, our email is very secure.
We also provide Legit Hacking Services such as-:
🔸Phone Hacking/Cloning
🔸Email Hacking & Password Recovery
🔸Social Media Hacking & Passowrd Recovery
🔸Deleted Files Recovery 🔸Mobil Tracking
🔸Virus detection & Elimination. e.t.c
Email-: [email protected]
Pythonax.
2020 © All Right Reserved.
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
data science course in guntur