The Weather Dataset provided has been preprocessed the traffic data ha been appended after preprocessing.The aim is to find the intersection dates available from both the datasets and do a predictive analsyis after combining traffic and weather datasets.so if future weather conditions are given or predicted by time series analysis ,public trasnport disruption could be interpreted using machine learning models.
(Just a small try by an undergrad engineering student,Hope you like it ๐ ).
semifinal qualification progress:
datasets have been downloadedย for public transport(RTA dubai pulse) for the dates which were common with the weather dataset.(01-Jan-2018 to 31-Mar-2018) in a file and have preprocessed the datasets and got the hourly traffic flow numbers on each of the days (between 01-Jan-2018 to 31-Mar-2018) these hourly traffic flows were appended to the weather dataset.
To analyse how traffic would increase or decrease during various weather conditions by applyingย various machine learning and time series models.
python code:
Weather dataset preprocessing¶
keeping only the necessary features to predict how traffic is affected by various weather conditions.
future aim: using machine learning models and time series analysis
Out[2]:
|
city_name |
lat |
lon |
main/temp |
main/temp_min |
main/temp_max |
main/feels_like |
main/pressure |
main/humidity |
wind/speed |
... |
weather/0/icon |
dt |
dt_iso |
timezone |
rain/1h |
weather/1/id |
weather/1/main |
weather/1/description |
weather/1/icon |
rain/3h |
0 |
Dubai |
25.07501 |
55.188761 |
14.99 |
13.0 |
18.00 |
13.70 |
1015 |
87 |
3.1 |
... |
01n |
1514764800 |
2018-01-01 00:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
1 |
Dubai |
25.07501 |
55.188761 |
14.63 |
13.0 |
17.00 |
13.91 |
1015 |
93 |
2.6 |
... |
01n |
1514768400 |
2018-01-01 01:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
2 |
Dubai |
25.07501 |
55.188761 |
14.03 |
12.0 |
17.00 |
13.89 |
1016 |
93 |
1.5 |
... |
01n |
1514772000 |
2018-01-01 02:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
3 |
Dubai |
25.07501 |
55.188761 |
13.78 |
12.0 |
17.00 |
13.14 |
1016 |
93 |
2.1 |
... |
50n |
1514775600 |
2018-01-01 03:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
4 |
Dubai |
25.07501 |
55.188761 |
14.28 |
12.0 |
18.00 |
13.45 |
1017 |
93 |
2.6 |
... |
50d |
1514779200 |
2018-01-01 04:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
19339 |
Dubai |
25.07501 |
55.188761 |
22.85 |
21.0 |
25.45 |
22.19 |
1015 |
64 |
3.6 |
... |
01n |
1584385200 |
2020-03-16 19:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
19340 |
Dubai |
25.07501 |
55.188761 |
22.35 |
21.0 |
24.00 |
21.17 |
1015 |
68 |
4.6 |
... |
01n |
1584388800 |
2020-03-16 20:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
19341 |
Dubai |
25.07501 |
55.188761 |
21.52 |
20.0 |
23.36 |
21.43 |
1015 |
72 |
3.1 |
... |
01n |
1584392400 |
2020-03-16 21:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
19342 |
Dubai |
25.07501 |
55.188761 |
21.04 |
19.0 |
23.36 |
21.19 |
1014 |
77 |
3.1 |
... |
01n |
1584396000 |
2020-03-16 22:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
19343 |
Dubai |
25.07501 |
55.188761 |
20.31 |
18.0 |
23.28 |
19.83 |
1014 |
77 |
3.6 |
... |
01n |
1584399600 |
2020-03-16 23:00:00 +0000 UTC |
14400 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
19344 rows × 25 columns
Out[3]:
|
lat |
lon |
main/temp |
main/temp_min |
main/temp_max |
main/feels_like |
main/pressure |
main/humidity |
wind/speed |
wind/deg |
clouds/all |
weather/0/id |
dt |
timezone |
rain/1h |
weather/1/id |
rain/3h |
count |
1.934400e+04 |
1.934400e+04 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
19344.000000 |
1.934400e+04 |
19344.0 |
28.000000 |
3.000000 |
85.000000 |
mean |
2.507501e+01 |
5.518876e+01 |
28.102823 |
26.661868 |
29.810532 |
27.684793 |
1009.416098 |
52.495089 |
3.879056 |
188.379239 |
13.751964 |
791.048904 |
1.549582e+09 |
14400.0 |
0.415357 |
674.333333 |
0.828706 |
std |
1.291090e-11 |
2.732107e-11 |
7.329419 |
7.580049 |
7.240840 |
8.309911 |
8.017003 |
21.660800 |
2.098738 |
106.258945 |
26.479664 |
41.348948 |
2.010339e+07 |
0.0 |
0.456707 |
150.111070 |
0.615405 |
min |
2.507501e+01 |
5.518876e+01 |
10.890000 |
7.000000 |
12.000000 |
6.340000 |
972.000000 |
4.000000 |
0.300000 |
0.000000 |
0.000000 |
200.000000 |
1.514765e+09 |
14400.0 |
0.110000 |
501.000000 |
0.130000 |
25% |
2.507501e+01 |
5.518876e+01 |
22.030000 |
20.920000 |
23.840000 |
20.750000 |
1003.000000 |
35.000000 |
2.315000 |
100.000000 |
0.000000 |
800.000000 |
1.532174e+09 |
14400.0 |
0.167500 |
631.000000 |
0.380000 |
50% |
2.507501e+01 |
5.518876e+01 |
28.060000 |
26.670000 |
30.000000 |
27.315000 |
1011.000000 |
53.000000 |
3.600000 |
180.000000 |
1.000000 |
800.000000 |
1.549582e+09 |
14400.0 |
0.215000 |
761.000000 |
1.000000 |
75% |
2.507501e+01 |
5.518876e+01 |
33.880000 |
32.810000 |
35.122500 |
34.890000 |
1016.000000 |
69.000000 |
5.100000 |
290.000000 |
19.000000 |
800.000000 |
1.566991e+09 |
14400.0 |
0.400000 |
761.000000 |
1.000000 |
max |
2.507501e+01 |
5.518876e+01 |
45.940000 |
45.360000 |
48.000000 |
47.890000 |
1026.000000 |
100.000000 |
14.900000 |
360.000000 |
100.000000 |
804.000000 |
1.584400e+09 |
14400.0 |
2.030000 |
761.000000 |
3.810000 |
Out[4]:
city_name 19344
lat 19344
lon 19344
main/temp 19344
main/temp_min 19344
main/temp_max 19344
main/feels_like 19344
main/pressure 19344
main/humidity 19344
wind/speed 19344
wind/deg 19344
clouds/all 19344
weather/0/id 19344
weather/0/main 19344
weather/0/description 19344
weather/0/icon 19344
dt 19344
dt_iso 19344
timezone 19344
rain/1h 28
weather/1/id 3
weather/1/main 3
weather/1/description 3
weather/1/icon 3
rain/3h 85
dtype: int64
Out[8]:
|
dt_iso |
temp |
temp_feels_like |
main_pressure |
humidity |
wind_speed |
wind_deg |
clouds |
rain_1h |
rain_3h |
weather_main |
weather_description |
0 |
2018-01-01 00:00:00 |
14.99 |
13.70 |
1015 |
87 |
3.1 |
150 |
1 |
0.0 |
0.0 |
Clear |
sky is clear |
1 |
2018-01-01 01:00:00 |
14.63 |
13.91 |
1015 |
93 |
2.6 |
150 |
1 |
0.0 |
0.0 |
Clear |
sky is clear |
2 |
2018-01-01 02:00:00 |
14.03 |
13.89 |
1016 |
93 |
1.5 |
150 |
1 |
0.0 |
0.0 |
Clear |
sky is clear |
3 |
2018-01-01 03:00:00 |
13.78 |
13.14 |
1016 |
93 |
2.1 |
180 |
1 |
0.0 |
0.0 |
Mist |
mist |
4 |
2018-01-01 04:00:00 |
14.28 |
13.45 |
1017 |
93 |
2.6 |
160 |
1 |
0.0 |
0.0 |
Mist |
mist |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
19339 |
2020-03-16 19:00:00 |
22.85 |
22.19 |
1015 |
64 |
3.6 |
50 |
0 |
0.0 |
0.0 |
Clear |
sky is clear |
19340 |
2020-03-16 20:00:00 |
22.35 |
21.17 |
1015 |
68 |
4.6 |
60 |
0 |
0.0 |
0.0 |
Clear |
sky is clear |
19341 |
2020-03-16 21:00:00 |
21.52 |
21.43 |
1015 |
72 |
3.1 |
60 |
0 |
0.0 |
0.0 |
Clear |
sky is clear |
19342 |
2020-03-16 22:00:00 |
21.04 |
21.19 |
1014 |
77 |
3.1 |
70 |
0 |
0.0 |
0.0 |
Clear |
sky is clear |
19343 |
2020-03-16 23:00:00 |
20.31 |
19.83 |
1014 |
77 |
3.6 |
60 |
0 |
0.0 |
0.0 |
Clear |
sky is clear |
19344 rows × 12 columns
Traffic data Preprocessing¶
i have downloaded the datasets for public transport(RTA dubai pulse) for the dates which were common with the weather dataset.(01-Jan-2018 to 31-Mar-2018) in a file and have preprocessed the datasets and got the hourly traffic flow numbers on each of the days (between 01-Jan-2018 to 31-Mar-2018) these hourly traffic flows were appended to the weather dataset.
Future aim:To analyse how traffic would increase or decrease during various weather cconditions using machine learning models
Accessing file:bus_ridership_2018-03-01_00-00-00.csv
C:\Users\Taha\anaconda3\lib\site-packages\ipykernel_launcher.py:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if sys.path[0] == '':
C:\Users\Taha\anaconda3\lib\site-packages\pandas\core\frame.py:3997: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
C:\Users\Taha\anaconda3\lib\site-packages\ipykernel_launcher.py:14: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
9191
2798
647
462
898
9169
38606
67596
73220
49366
31752
28982
29140
31227
34172
36461
44691
60996
75503
71249
56532
44094
36042
21997
Accessing file:bus_ridership_2018-03-02_00-00-00.csv
9135
2267
928
689
772
3380
12366
19534
27079
30441
35311
38338
36236
34513
42622
46964
49466
53371
54603
52645
47327
43425
36686
22007
Accessing file:bus_ridership_2018-03-03_00-00-00.csv
5908
1303
855
638
992
6965
24315
45150
53888
44934
32168
29534
30453
32576
33725
33755
40109
50116
58855
51977
42437
35127
29168
15921
Accessing file:bus_ridership_2018-03-04_00-00-00.csv
5235
1225
704
446
977
9787
39560
69791
74657
50226
32799
29745
29987
31251
30820
32879
42009
59589
74186
66900
49134
36553
28826
15496
Accessing file:bus_ridership_2018-03-05_00-00-00.csv
C:\Users\Taha\anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3063: DtypeWarning: Columns (3) have mixed types.Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
5040
1224
503
467
980
9293
39603
68373
75173
49792
32187
28566
31701
34231
34431
34149
41091
59585
74783
65631
47630
36266
28674
15773
Accessing file:bus_ridership_2018-03-06_00-00-00.csv
5370
1204
592
410
969
9455
39291
69299
74194
49324
31834
28698
29344
30333
30128
32121
40753
59310
75111
65596
47820
35111
28854
15484
Accessing file:bus_ridership_2018-03-07_00-00-00.csv
5250
1303
639
473
999
9361
39648
68649
74203
50642
31784
28511
28832
29611
30068
31746
39516
59037
75056
64705
47434
35446
28682
15593
Accessing file:bus_ridership_2018-03-08_00-00-00.csv
9421
2659
587
481
996
9049
38997
67938
72984
50263
31565
28840
29062
31429
33659
36037
44125
60117
76562
70345
56569
44714
34910
21285
Accessing file:bus_ridership_2018-03-09_00-00-00.csv
10175
2567
979
741
817
3348
12266
19279
26007
30188
33828
36286
33962
32987
41190
46026
49833
53017
56040
54775
50146
45843
38698
24205
Accessing file:bus_ridership_2018-03-10_00-00-00.csv
15
11
944
740
978
6853
24327
44966
53045
43190
30840
27332
27187
27996
27403
25660
27188
29789
26717
17994
9317
4562
1802
404
Accessing file:bus_ridership_2018-03-11_00-00-00.csv
5122
1172
0
12
22
170
630
1257
1714
1786
1551
2134
2731
3453
4247
5300
9316
21436
38543
41944
36271
30241
26295
14739
Accessing file:bus_ridership_2018-03-12_00-00-00.csv
5135
1156
556
410
923
9409
40170
69586
75011
50401
32731
28939
28733
30557
29978
32311
40697
58456
75005
65223
47564
35112
28013
15600
Accessing file:bus_ridership_2018-03-13_00-00-00.csv
5383
1193
598
506
856
9645
40557
69861
74568
51856
32860
29798
29061
29622
29650
32046
40369
58116
74616
65800
47630
35271
28397
16034
Accessing file:bus_ridership_2018-03-14_00-00-00.csv
5181
1181
560
446
1006
9468
39966
66915
71386
51677
32438
29004
29228
30181
29852
31364
39807
58005
74563
65760
46206
35296
28431
15557
Accessing file:bus_ridership_2018-03-15_00-00-00.csv
9798
2662
623
428
952
9097
39360
68083
73596
50275
30771
28590
28860
31497
33082
35769
44098
60061
75441
72034
57098
43496
35691
21486
Accessing file:bus_ridership_2018-03-16_00-00-00.csv
10088
2710
1033
680
789
3463
12806
20187
26558
30801
34102
36868
33733
33805
40743
45678
49866
53171
57066
55622
50567
45694
38855
24017
Accessing file:bus_ridership_2018-03-17_00-00-00.csv
5755
1343
987
694
886
7025
24932
44608
53622
44222
31669
29662
29516
32044
32724
32697
39331
48432
57315
50443
40761
33198
27572
15404
Accessing file:bus_ridership_2018-03-18_00-00-00.csv
37
0
635
448
1024
9518
39672
69095
74064
48424
32528
27369
27470
26978
25627
25599
27432
32316
34365
22044
10446
6058
2148
575
Accessing file:bus_ridership_2018-03-19_00-00-00.csv
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
4
24
13
10
0
7
Accessing file:bus_ridership_2018-03-20_00-00-00.csv
5
0
0
0
1
13
99
157
156
135
34
23
33
23
26
51
49
111
257
284
161
143
123
70
Accessing file:bus_ridership_2018-03-21_00-00-00.csv
5245
1233
0
0
6
190
703
1379
1580
1279
1132
2084
2275
3317
4272
5104
8152
17606
35351
43421
37568
30357
26171
14970
Accessing file:bus_ridership_2018-03-22_00-00-00.csv
426
289
557
423
1051
9368
38895
66969
71208
48471
30517
26802
26926
28383
28349
29349
35751
43469
43377
25690
11806
5206
2527
1073
Accessing file:bus_ridership_2018-03-23_00-00-00.csv
9296
2184
177
101
175
130
109
407
545
725
1189
1850
2304
3423
6341
10366
14658
18747
24989
30789
33949
36011
33514
22033
Accessing file:bus_ridership_2018-03-24_00-00-00.csv
59
4
779
606
902
6268
22662
40534
48931
40127
27851
25022
24495
25944
24709
23735
24623
27330
26580
17902
9665
4749
1571
266
Accessing file:bus_ridership_2018-03-25_00-00-00.csv
4841
1283
0
0
21
105
839
1355
1790
1899
2038
2451
2900
3616
4618
6173
11723
23099
39728
41195
35423
30904
26027
14492
Accessing file:bus_ridership_2018-03-26_00-00-00.csv
5446
1362
577
419
1026
9043
38385
69558
74533
49427
31945
28807
28823
29638
28786
31063
39156
57402
74066
63146
46965
34263
28302
15482
Accessing file:bus_ridership_2018-03-27_00-00-00.csv
5217
1314
613
407
1058
8936
37986
69164
73865
50050
32154
28974
28069
29450
28748
30706
39313
57766
75062
63824
46809
35232
29444
15523
Accessing file:bus_ridership_2018-03-28_00-00-00.csv
5552
1248
567
429
796
8985
38145
69193
73674
50330
31773
28411
28928
29404
28284
30168
39044
56837
72358
64103
47820
35854
29526
16802
Accessing file:bus_ridership_2018-03-29_00-00-00.csv
9274
2670
553
435
922
8538
36631
67421
72673
49360
32011
28766
28878
30585
32071
34678
43323
61262
75312
70337
58606
44131
34960
21794
Accessing file:bus_ridership_2018-03-30_00-00-00.csv
10622
2506
941
683
708
3346
12541
21380
27436
30922
33861
34819
31094
32187
38606
41197
46633
54360
53340
52670
50334
43545
37627
25036
Accessing file:bus_ridership_2018-03-31_00-00-00 .csv
6658
1814
997
639
1029
6896
24430
44110
54957
44820
31796
28666
28609
31067
31982
32548
38633
50905
58619
53090
42159
34960
30798
19848
6 thoughts on “Weather Disruption of Public Transport Analysis Using Python”
The same segments repeat across article.
The entire focus of the article is based on data analysis where we are missing all models which are aligning two datasets together and finds appropriate correlation and causality in data.
I know that time was short, so I would recommend teaming with someone else next time so that work can be split among team members.
Also, I would advise using some additional datasets which were not part of the initial dataset, like aggregated daily traffic estimates on an hourly basis provided by some navigation applications because that can additionally help with model precision. We all know that bus driers should be professionals but the majority of โnormalโ non-bus driers are not and they are heavily impacted in distracting sensor inputs (thunderstorm, rain, people cutting in, or even forgetting how to drive when weather condition changes). – I’m adding my last sentence about additional dataset to all teams focusing on this problem because no one did even consider it and that is something you can always do on any project – focus not on internal/provided data but find something to augment it ๐
my aim was to combine hourly analysis(which i got by processing the dubai traffic datatsets) of traffic to the weather data and later apply machine learning models and time series models to it.but time fell too short for me, since this was my first time ever.
anyways enjoyed the journey and yes…lesson learnt,always team up !.
the long rails of numbers u see in my article in the middle are the hourly dsitribution of traffic for each day of each month (eventhough the intersection dates were only for 3 months between the weather data and traffic data on the dubaipulse site). ๐
Hi taha-junaid3000, the approach was good and I could read through the long rails of numbers, but here when I talk about the same segments repeating, I talk about the page itself. If you do a search (find on page) for the chapter “Traffic data Preprocessing”, you will find it three times with completely and exactly the same text – or at least it is how I see it on my browser ๐
Hi, taha-junaid3000 ๐
tomislavk is right… Splitting the work with someone would help you to achive better results and to learn much more while collaborating with others ๐
Keeping in mind your work I would focus more on the analysis and conclusions regarding the data quality, variables for modelling, etc. This would be helpful to make next steps.