Datathons Solutions

Datathon Telenor Solution – Game of Prediction (GoP)

The objective of this analysis is to find out the ravens that are not reaching the destination on time. This kind of analysis would help us to scrutinize and understand the towers(ravens) who would require our utmost attention, in order to improve the reasons which are playing a major role in the delays.
The data-set talks about the networks between the towers (ravens). The land based communication happens with the help of signals.
A cellular network or mobile network is a communication network where the last link is wireless. This wireless transmission is done by a tower which comprises of a transmitter and a receiver (for the wireless transmission). The channel provides transmission for both the data as well as Voice transmission.
Every cellular network has different set of frequencies, to avoid any kind of overlapping and interference. Despite of many precautions for maintaining the setup, there are few parameters that are still impacting the transmission. Few parameters can be classified as:
 Infrastructure
 Interference between the frequencies
 Climatic conditions
 External Factors (Predators etc.)
For this our first approach is to create a “Decision Model” which can help us to give value to our business and help in improving the communication.
****** The tools that we using in order to predict is ******
1. Visual Analysis using different plots
2. Usage of ARMA (Auto-regressive- Moving- Average- Model)
The usage of this Decision Model will help us in forecasting the failure rate for next 4-7 days in regards to the Ravens.

0
votes

The Telenor Case_TCM

Objective:

  • The objective of this analysis is to make the predictions for upcoming 4 days based on the functioning of the ravens that are not reaching the destination on time.
  • This kind of exploration would help us to scrutinize and apprehend the towers who would require our consideration, in order to demonstrate the reasons which are playing a major role in the interruptions.

Let’s start by understanding the dataset.

Introduction:

The dataset talks about the communication between the towers (ravens). The land based communication happens with the help of signals. A cellular network or mobile network is a communication network where the last link is wireless. This wireless transmission is done by a tower which comprises of a transmitter and a receiver (for the wireless transmission). The channel provides transmission for both data as well as Voice transmission.

Every cellular network has different set of frequencies, to avoid any kind of overlapping and interference.

Despite of many precautions, for maintaining the setup, there are few parameters that are still impacting the transmission. Those parameters can be classified as:

  • Infrastructure
  • Interference between the different frequencies
  • Climatic conditions
  • External Factors (Predators etc.)

 

 

 

 

  1. Business Understanding

 

In this we need to analyse the dataset provided to us and derive a “Decision Model”. Before starting to move in the direction of Business understanding let’s get a good hold on “Decision Model”.

 

Managing the basic Business logic, which can lead to visible conclusions and outcomes. The raw input given to the decision model should contain all the facts and the logics on which our Decision model will work. Ultimately, it is the logic that derives the entire process. This logic gives the correct shape and leads to the conclusion in order to provide two things:

 

  • Meaning to the Decision Model
  • Value to the Business

After understanding the “Decision Model” let’s know that why on the first page we require that concept here in analysing.

The major reason that led us towards the direction of “Decision Model” is the requirement to understand the background of the dataset.

Background

This data is derived from the total number of times the raven was sent for communication purpose but was not able to reach the destination on time. Hence, there was a quick need of a perfect yet effective plan to end up these communication crisis.

  1. Data Understanding

Under this we would be covering the below mentioned points:

 

  • Initial Data Collection: Data is for about a month starting from 6th July’2018 – 4th Aug’2018.
  • Familiar with Data: The dataset consists of columns which describes the delay due to which the communication failed or was delayed, that require our utmost attention.

Understanding of the Columns in the Dataset:

  • “DATETIME”: This column records the date and time of each transmission.
  • “Raven Name”: This column records from where the transmission of communication is taking place. The data passes through many phrases, such as obtaining the IP address of the website name, Domain Name System. After this the connection request from sender is sent to the receiver. When acceptance acknowledgement is sent from the server to the client connection is successfully setup. Delays an happen in this process.

We have divided the delays in three categories such as:

  • DNS Delay:

The DNS delay mainly concerns with local internet service provider (ISP). The ISP catches the DNS record queried through the server. This delay from ISP will prevent us from viewing the website. This is known as DNS propagation delay. We have two columns for the delays:

  • “FIRST_DNS_RESPONSE_SUCCESS_D”:
  • “DNS_RESPONSE_SUCCESS_DELAY”:
  • Connection Delay:

The Delay in establishing the connection. The maximum number of time a TCP Packet is retransmitted in established state before giving up. The default value is 15, which corresponds to a duration of 13-30 minutes, depending upon the retransmission time out

  • “TCP_SETUP_TOTAL_DELAY”
  • “FIRST_TCP_RESPONSE_SUCCESS_D”
  • “TCP_CONNECT_DELAY”
  • “SYN_SYN_DELAY”
  • Page Delay:
  • “FIRST_GET_RESPONSE_SUCCESS_D”
  • “PAGE_CONTENT_DOWNLOAD_TOTAL_D”
  • “PAGE_SR_DELAYS”
  • “PAGE_BROWSING_DELAYS”
  • “PAGE_BROWSING_DELAY”
  • SR=> Selective response

Selective repeat attempts to retransmit only those

Packets are lost (due to errors).

 

  1. Data Preparation

 

We have filtered the data according to the columns that are mentioned in the dataset.

The data set contains approximately 10,819 rows that have the value as “Zero”. We have to exclude those rows in order to   get the appropriate values for our analysis and forecast. Moreover, the dataset has one column that has time and date included in them. Using that column, we have extracted the date and the time which we would be using in the time series analysis ahead. The data that we are using is already clean and doesn’t require more cleaning.

 

  1. Modelling

The model that we are using in order to analyse the failure rate of ravens is the Auto- regressive Integrated Moving Average (ARIMA) Model. This helps us to forecast the parameters that led to the delay or failure rate of the ravens.

The reason we have opted for this model because ARIMA models provide another approach to time series forecasting. ARIMA models is the most widely used approach to time series forecasting, and provides complementary approaches to the problem. ARIMA models aim to describe the autocorrelations in the data.

  1. Data Analysis

In this section our major focus is on analysing the data provide to us of the basic business prospective. Before we proceed any further we need to get clear with the basic set of questions which can be analysed by this data.

TOP TEN RAVEN WITH FAILS

  • In this we found the top ten raven that have taken maximum time to reach their destination with respect to time. We have taken those delay columns in consideration in which the columns have some values.

Please find below the result:

Raven Name Total Delays
Sole Musical raven Azul 41114
Withered raven Mo 39447
Biggest Strongest raven Wilbur 36498
Loving  raven Maxwell 31787
Cadmium Red raven Destiny 30820
Beautiful And Saucy raven Boo-boo 29989
Big Dip O’ Ruby raven Bibi 28054
Bittersweet Shimmer raven Chip 26605
Purple raven Phoenix 26024
Small Gregarious raven Paco 24380

 

TOP TEN RAVEN WITHOUT FAILS

  • In this we found the top ten raven that have reached their destination in time. We have taken those delay columns in consideration in which the columns values are zero.s

Please find below the result:

Raven Name Total No Delays
Metallic Sunburst raven Polly 297
Green Sheen raven Azul 211
Less Combative raven Zazu 191
Weak  raven Buddy 188
Copper raven Tweety 179
Spectral Yellow raven Zazu 1481
Mythical raven Tiki 116
Cyber Grape raven Faith 104
Mysterious And Venerable raven Bubba 98
Shadow Blue raven Sammy 95

 

FAMILY WITH MOST FAILS

  • In this we found the top family which has sent ravens but they did not reach the destination on time.

Please find below the result:

Family Name Max Delays
Targerian 487966

 

  • Family and Network-wise most fails

FAMILY WITH LEAST FAILS

  • Top family which has sent the ravens with minimum delay with respect to time.

Please find below the result:

Family Name Max Delays
Lannister 212823
  • Family and Network-wise least fails

FAMILY MEMBERS WITH MOST FAILS

  • In this we found the top members of a particular family across all the regions which has sent ravens but they did not reach the destination on time.

Please find below the result:

MEMBER_NAME Most Delays
Petyr Baelish 2742866
Deanery 2380853
Theon 2073554
Maester Aemon 2065623
Maester Kerwin 1374452
Eddard 1303839
Aeron 1255917
Robb 1140542
Rheagar 1119002
Viserys 1114054

 

 

 

  • Family member and Network-wise most fails

Please find below the result:

MEMBER_NAME

 

NETWORK

 

MostDelays

 

Petyr Baelish 2g 1335556
Petyr Baelish 4g 1269052
Deanerys 2g 1226143
Maester Aemon 2g 1042310
Theon 2g 1038493
Deanerys 4g 1021780
Maester Aemon 4g 907730
Theon 4g 900776
Maester Kerwin 2g 687469
Eddard 2g 653142

 

FAMILY MEMBER WITH LEAST FAIL

  • In this we found the top family member which has sent ravens and they reach the destination with least fail.

Please find below the result:

MEMBER_NAME LEAST_DELAYS
Jamie 761604
Aegon 756639
Kevan 745294
Tywin 739867
Yara/Asha 704214
Benjen 633369
Lancel 597992
Joanna 581775
Sansa 559570
Euron 491209

 

 

  • Family member and Network-wise least fails:

In this we are considering the family member and the network (2G, 3G, 4G)

Please find below the result:

MEMBER_NAME NETWORK LEAST_DELAYS
Aerys – The mad king 3g 38746
Yara/Asha 3g 38577
Benjen 3g 38329
Aegon 3g 36563
Sansa 3g 36132
Maester Pyelle 3g 35122
Joanna 3g 31529
Lancel 3g 29921
Tywin 3g 29137
Euron 3g 25487

 

  1. Modelling

In this we have considered two models based on the time series analysis:

  • Failure Rate of the Raven for next four days

  • For the entire dataset considering all the columns we have predicted the failure rate for upcoming four days.

 

The model that we are using in order to analyse the failure rate of ravens is the Auto- regressive Integrated Moving Average (ARIMA) Model. This helps us to forecast the parameters that led to the delay or failure rate of the ravens.

 

While preparing both the models, we considered original DATETIME column as separating date and time for the column resulted in loss of information.

We checked residuals and AIC value to predict the best possible models. Model with least AIC value resulted in best ARIMA model.

The reason we have opted for this model because ARIMA models provide another approach to time series forecasting. ARIMA models is the most widely used approach to time series forecasting, and provides complementary approaches to the problem. ARIMA models aim to describe the autocorrelations in the data.

 

There are 7848 unique ravens, we can build ARIMA model for each of them so that

Failure information can be forecasted for each of them.

 

  1. Evaluation

We have used the time series model for the forecasting the delays of ravens.  This prediction will help us to make the raven fulfil their tasks in a proper and more efficient manner.

With the help of this analysis we were able to understand the trend of the disruptions between the Raven communications. The conclusion is mentioned below:

  • Starting from 8AM till 11 PM we have facing a major hit on our communication through the Ravens.
  • Maximum downward trend has been observed on 23rd

By this we can analyse the downward trend which can help us in analysing the major reasons for this failure. With proper study we can try to reduce the major factors which are taking this trend to reach the maximum peak and giving us the hit that we need to avoid as maximum as possible.

 

Share this

3 thoughts on “Datathon Telenor Solution – Game of Prediction (GoP)

  1. 1
    votes

    Team, please do provide process how you prepared data, build and evaluated model(s) for prediction. For any “scientific” article and for anybody who has access to data, whole process should be repetable, i.e. anyone should be able to take your code/work and get same end results. Also, focus of this case is “The main task is to predict the fails in the next four days (on both files).” which is clearly stated in case description, so please focus on adding this part of information into your article (more than just sentence “Model is giving better prediction for 1 day”).

  2. 0
    votes

    Good desctiption of the data understending. Some recommendations – In the TOP Ravens tasks was good to describe how you have grouped the delays. Also will be good to have some more info about why is ARIMA better in this case then other methods, and maybe to add some results from others also in the article.

  3. 0
    votes

    At beginning of document you attached zip file with code which may be overlooked. I would suggest moving it to end of document to become more visible. But overall, nice exploration and clean process

Leave a Reply