Prediction systems

Datathon Telenor Solution – Analysing and Predicting Delays in Mobile Data Connectivity

“You know nothing, Jon Snow ……”
is what Ygritte yells to Jon.

Here our situation is also the same as we know nothing about the TELENOR case until we have seen the dataset.

At first, When we heard about the Datathon as beginners, we were very excited to take apart in it.
At finally we received our datasets and here’s our first challenge to import the dataset into the programming platforms,
As we have faced some hurdles to import the dataset as the size of the dataset is around 4GB which has taken some time and put us in the situation :

What we don’t know is what usually gets us killed………………………– Petyr Baelish

we have mentioned above line to express our feeling that we don’t know what’s in the dataset but we want to explore through that.
At last, we are ready to Analysis What do Game of Thrones and Telecoms Have in Common?

At first, when we have gone through the dataset, We have noticed that the Telenor data contains 16 exciting columns with
30091754 jolted rows

When we have gone through the first analysis, we came to know that how complicated the data is, it contains many interesting aspects which we have done through the Exploratory Data Analysis.

Here our main challenge is to predict the fails in the next four days
At First, we have done Exploratory data analysis
(i) Top 10 ravens with fails :
RAVEN_NAME
Brass raven Birdy
Brown raven Ruby
Yellow raven Rio
Blue raven Axel
Razzle Dazzle Rose raven Cleo
Cadmium Red raven Bubba
Vain And Lazy raven Polly
Fearful Carrion raven Gizmo
Blast Off Bronze raven Zazu
Loving raven Maxwell

(ii) Top 10 ravens without fails:

RAVEN_NAME
Metallic Sunburst raven Polly
Green Sheen raven Azul
Less Combative raven Zazu
Weak raven Buddy
Copper raven Tweety
Spectral Yellow raven Zazu
Mythical raven Tiki
Cyber Grape raven Faith
Mysterious And Venerable raven Bubba
Shadow Blue raven Sammy

(iii) The family with most fails :

FAMILY_NAME
Targerian

(iv) The family with least fails :

FAMILY_NAME
Baelish

(v) The family member with most fails :

MEMBER_NAME
Petyr Baelish

(vi) The family member with least fails :

MEMBER_NAME
Euron

After the EDA we need to predict the future four days of delays in mobile data connectivity. To predict the four days delays we use Time Series analysis.

In Time Series Analysis we used three algorithms ARIMA, Simple Exponential Analysis, Recurrent Neural Networks.

We fitted the model with ARIMA and predict the failures of four days and fitted the model using another algorithm Simple Exponential Analysis.

And We used Recurrent Neural Networks for Prediction of failures.

After Fitting the three models using three different algorithms we evaluated by splitting the data into train and test.

We evaluated the best fit model by using the Root Mean Square Error. By considering the RMSE values of the three models, the model with the least RMSE value is taken as the best fit model.

In this case, considering the mobile failure dataset, RNN(Recurrent Neural Network)has the least RMSE value.

So, RNN is taken as the best fit model to predict the future four days of mobile data delays.

Based on the RNN algorithm the prediction of delays for the next four days based on the dataset
are 973776,973725,973674,973623 for 5 ,6,7,8, August 2018 respectively.

3
votes

TEAM NAME                          : DATA_TITANS

TEAM MEMBERS              :  M.HEMANTH KUMAR, A.PAVAN SHANKAR, B.MANOHAR, V. LITHIN CHOWDARY,  E.V.S.SAI RAM

PROBLEM STATEMENT :What do Game of Thrones and Telecoms Have in Common ?

 

Data Preparation:

At first we have recieved 700Mb of zip data and we extracted which has expanded to 4GB approximately.We imported the data in python and done EDA (Exploratory Data Analysis) and find the answers for given six questions.After we have observed the data ,some records have zero delays. We taken the zero delays  into a new dataset called data_least.This  data contains the Ravens with less number of  delays.And the remaining part of the data we imported into new dataset called data_most. We considered the data_most where delays are more than one.

With the help of cleaned data,we find the solutions for the six questions.After that we have done the Time series analysis for Mobile data delays.

We have grouped based on DATETIME,and found the count of delays for every date. We have exported it to csv file and done the Time Series analysis using R language. Here in R ,we imported the packages which are useful for Time series analysis and fitted the model.We have done the best fit model depending on RMSE values we considered the best fit model.

From this we came to know that the RAVEN means the TOWER and RAVEN is the channel for making the voice calls and for the usage of data. In order to share the information from one person to other person or one point to other point we need some channel to transfer or network to share it. From this data set we found there are 7847 unique RAVEN NAMES. Different networks are 2G,3G,4G.

(i) Top 10 ravens with fails :

For  this question ,we made the data into two groups .The rows having 0’s are taken as least number of failures. The other group having delays more than 1 are considered as delays or failures.We had taken the count of each row having delays .Our approach had made the following result.

 

(ii) Top 10 ravens without fails :

We divided the data into data_least and data_most.We have taken data_least which consists of rows with less number of delays.Here you can find the

top 10 ravens without fails

(iii) The family with most fails :

To find the family with most fails we hasd taken the data_most group and find the family with most fails.Here we got Targarean Family with most fails.

(iv) The family with least fails :

To find the family with most fails we had taken the data_most group and find the family with most fails.Here we got Baelish Family with most fails

(v) The family member with most fails :

We have taken the data_most group and we find the family name with most fails. In this we got the Petyr Baelish as family member with most fails.

(vi) The family member with least fails :

To find the family member with least fails to group by the family member.In this we got Euron as the least fails.

 

After the EDA we need to predict the future four days of delays in  mobile data connectivity. To predict the four days delays we use Time Series analysis.

In Time Series Analysis we used three algortihms ARIMA,Simple Exponential Model,Recurrent Neural Networks.

We first train the 3 models and based on RMSE we get the best fit model.

Time Series Analysis By using ARIMA:

We have used ARIMA alogorithm for predicting the next four days.It is represented below in a graph.

 

Evaluating the Model:

We have divided the data of 31 records for 31 days in the dataset.We divided 27 records into train data and 4 records as test data.

We evaluated the model using RMSE(Root Mean Square Error).

 

 

Time Series Analysis By using Simple Exponential Model:

 

Evaluating the Model:

Time Series Analysis By using Recurrent Neural Networks(RNN):

Evaluating the Model:

EVALUATING THE BEST FIT MODEL:

Root Mean Square for ARIMA:

Root Mean Square for Recurrent Neural Network:

 

Root Mean Square for Simple Exponential Model:

Hence we consider RNN(Recurrent Neural Network) as the best fit model for predicting the four days delays values.

We fitted the model with ARIMA and predict the failures of four days and fitted the model using another algorithm Simple Exponential Analysis.

And We used Recurrent Neural Networks for Prediction of failures.

After Fitting the three models using three different algorithms we evaluated by splitting the data into train and test.

We evaluated the best fit model by using the Root Mean Square Error.By considering the RMSE values of three models, the model with least RMSE value is taken as the best fit model.

In this case considering the mobile failure dataset,RNN(Recurrent Neural Network)has least RMSE value.

So,RNN is taken as the best fit model to predict the future four days of mobile data delays.

Code :

The below attachment consists the python code of the Exploratory Data Analysis and Rmarkdown file for Time Series Analysis.

RMD files

 

Additional analysis  on every 15 minuter for the first 3 days:

In the above graph we can analyse that failures are more at 12.00 clock .The same observations we can observe in the first three days at the same time.

There is high failure rate at 12.00  and 18.00 time.And there is a same kind of patterns repeating in the first three days

Analysis  on every 15 minutes for the random 3 days:

If you consider the random three days of entire data,they are more failures at the specific time at 12.00 and 18.00 clock.

We observe the similar kind of pattern in the data  all the days.

By the conclusion, we Analysed that the common things in between the Telenor and GAME OF THRONES are,
In GOT they used RAVEN as a messenger and mediators where
telecoms are using CELL TOWERS as RAVEN,

Based on the RNN algorithm the prediction of delays for the next four days based on the dataset
are 973776,973725,973674,973623 for 6,7,8,9 August 2018 respectively.

Share this

9 thoughts on “Datathon Telenor Solution – Analysing and Predicting Delays in Mobile Data Connectivity

  1. 3
    votes

    Team, for any “scientific” article and for anybody who has access to data, whole process should be repetable, i.e. anyone should be able to take your code/work and get same end results. Good part is that you already presented lot of code you used, but bad part is that it is either attachment (not clearly visible process, or embeded as picture). If you are able, I would like that you add those as textual part of your article for easier verification.

    1. 1
      votes

      Thank you for your suggestion sir.
      we have updated our article with the screenshots containing codes and added some additional analysis on the data and predicted the future failures for the given four days

  2. 1
    votes

    Looks very well done and detailed. I can recommend to add some more info about why is RNN better in this case then other methods, and maybe to add some results from others also in the article.

    1. 1
      votes

      Thank you, sir
      We have predicted using three algorithms ARIMA, Simple Exponential Model And RNN.
      We have observed the Root mean square error for three algorithms. In our data case observation, we got 31 days mobile data delays. We divided the data into 27 days for train data and 4 days for test data. We predicted the time series for three algorithms and compared RMSE values.RNN gives us the least Root mean square error. So, compared to other algorithms, we choose RNN for this data case.
      Moreover, Recurrent Neural Networks is a Deep learning algorithm which gives pretty good results for Sequential Time Series Analysis.

    1. 1
      votes

      Thank you for the compliment, sir
      we had a very good experience when analyzing the Telenor data,
      and we are pleased to hear from you
      and looking further to have this kind of interaction with you again.

Leave a Reply