Datathon Telenor Solution – Game of Prediction (GoP)

The objective of this analysis is to find out the ravens that are not reaching the destination on time. This kind of analysis would help us to scrutinize and understand the towers(ravens) who would require our utmost attention, in order to improve the reasons which are playing a major role in the delays.
The data-set talks about the networks between the towers (ravens). The land based communication happens with the help of signals.
A cellular network or mobile network is a communication network where the last link is wireless. This wireless transmission is done by a tower which comprises of a transmitter and a receiver (for the wireless transmission). The channel provides transmission for both the data as well as Voice transmission.
Every cellular network has different set of frequencies, to avoid any kind of overlapping and interference. Despite of many precautions for maintaining the setup, there are few parameters that are still impacting the transmission. Few parameters can be classified as:
 Infrastructure
 Interference between the frequencies
 Climatic conditions
 External Factors (Predators etc.)
For this our first approach is to create a “Decision Model” which can help us to give value to our business and help in improving the communication.
****** The tools that we using in order to predict is ******
1. Visual Analysis using different plots
2. Usage of ARMA (Auto-regressive- Moving- Average- Model)
The usage of this Decision Model will help us in forecasting the failure rate for next 4-7 days in regards to the Ravens.

0

The Telenor Case_TCM

Objective:

• The objective of this analysis is to make the predictions for upcoming 4 days based on the functioning of the ravens that are not reaching the destination on time.
• This kind of exploration would help us to scrutinize and apprehend the towers who would require our consideration, in order to demonstrate the reasons which are playing a major role in the interruptions.

Let’s start by understanding the dataset.

Introduction:

The dataset talks about the communication between the towers (ravens). The land based communication happens with the help of signals. A cellular network or mobile network is a communication network where the last link is wireless. This wireless transmission is done by a tower which comprises of a transmitter and a receiver (for the wireless transmission). The channel provides transmission for both data as well as Voice transmission.

Every cellular network has different set of frequencies, to avoid any kind of overlapping and interference.

Despite of many precautions, for maintaining the setup, there are few parameters that are still impacting the transmission. Those parameters can be classified as:

• Infrastructure
• Interference between the different frequencies
• Climatic conditions
• External Factors (Predators etc.)

In this we need to analyse the dataset provided to us and derive a “Decision Model”. Before starting to move in the direction of Business understanding let’s get a good hold on “Decision Model”.

Managing the basic Business logic, which can lead to visible conclusions and outcomes. The raw input given to the decision model should contain all the facts and the logics on which our Decision model will work. Ultimately, it is the logic that derives the entire process. This logic gives the correct shape and leads to the conclusion in order to provide two things:

• Meaning to the Decision Model

After understanding the “Decision Model” let’s know that why on the first page we require that concept here in analysing.

The major reason that led us towards the direction of “Decision Model” is the requirement to understand the background of the dataset.

Background

This data is derived from the total number of times the raven was sent for communication purpose but was not able to reach the destination on time. Hence, there was a quick need of a perfect yet effective plan to end up these communication crisis.

1. Data Understanding

Under this we would be covering the below mentioned points:

• Initial Data Collection: Data is for about a month starting from 6th July’2018 – 4th Aug’2018.
• Familiar with Data: The dataset consists of columns which describes the delay due to which the communication failed or was delayed, that require our utmost attention.

Understanding of the Columns in the Dataset:

• “DATETIME”: This column records the date and time of each transmission.
• “Raven Name”: This column records from where the transmission of communication is taking place. The data passes through many phrases, such as obtaining the IP address of the website name, Domain Name System. After this the connection request from sender is sent to the receiver. When acceptance acknowledgement is sent from the server to the client connection is successfully setup. Delays an happen in this process.

We have divided the delays in three categories such as:

• DNS Delay:

The DNS delay mainly concerns with local internet service provider (ISP). The ISP catches the DNS record queried through the server. This delay from ISP will prevent us from viewing the website. This is known as DNS propagation delay. We have two columns for the delays:

• “FIRST_DNS_RESPONSE_SUCCESS_D”:
• “DNS_RESPONSE_SUCCESS_DELAY”:
• Connection Delay:

The Delay in establishing the connection. The maximum number of time a TCP Packet is retransmitted in established state before giving up. The default value is 15, which corresponds to a duration of 13-30 minutes, depending upon the retransmission time out

• “TCP_SETUP_TOTAL_DELAY”
• “FIRST_TCP_RESPONSE_SUCCESS_D”
• “TCP_CONNECT_DELAY”
• “SYN_SYN_DELAY”
• Page Delay:
• “FIRST_GET_RESPONSE_SUCCESS_D”
• “PAGE_SR_DELAYS”
• “PAGE_BROWSING_DELAYS”
• “PAGE_BROWSING_DELAY”
• SR=> Selective response

Selective repeat attempts to retransmit only those

Packets are lost (due to errors).

1. Data Preparation

We have filtered the data according to the columns that are mentioned in the dataset.

The data set contains approximately 10,819 rows that have the value as “Zero”. We have to exclude those rows in order to   get the appropriate values for our analysis and forecast. Moreover, the dataset has one column that has time and date included in them. Using that column, we have extracted the date and the time which we would be using in the time series analysis ahead. The data that we are using is already clean and doesn’t require more cleaning.

1. Modelling

The model that we are using in order to analyse the failure rate of ravens is the Auto- regressive Integrated Moving Average (ARIMA) Model. This helps us to forecast the parameters that led to the delay or failure rate of the ravens.

The reason we have opted for this model because ARIMA models provide another approach to time series forecasting. ARIMA models is the most widely used approach to time series forecasting, and provides complementary approaches to the problem. ARIMA models aim to describe the autocorrelations in the data.

1. Data Analysis

In this section our major focus is on analysing the data provide to us of the basic business prospective. Before we proceed any further we need to get clear with the basic set of questions which can be analysed by this data.

TOP TEN RAVEN WITH FAILS

• In this we found the top ten raven that have taken maximum time to reach their destination with respect to time. We have taken those delay columns in consideration in which the columns have some values.

 Raven Name Total Delays Sole Musical raven Azul 41114 Withered raven Mo 39447 Biggest Strongest raven Wilbur 36498 Loving  raven Maxwell 31787 Cadmium Red raven Destiny 30820 Beautiful And Saucy raven Boo-boo 29989 Big Dip O’ Ruby raven Bibi 28054 Bittersweet Shimmer raven Chip 26605 Purple raven Phoenix 26024 Small Gregarious raven Paco 24380

TOP TEN RAVEN WITHOUT FAILS

• In this we found the top ten raven that have reached their destination in time. We have taken those delay columns in consideration in which the columns values are zero.s

 Raven Name Total No Delays Metallic Sunburst raven Polly 297 Green Sheen raven Azul 211 Less Combative raven Zazu 191 Weak  raven Buddy 188 Copper raven Tweety 179 Spectral Yellow raven Zazu 1481 Mythical raven Tiki 116 Cyber Grape raven Faith 104 Mysterious And Venerable raven Bubba 98 Shadow Blue raven Sammy 95

FAMILY WITH MOST FAILS

• In this we found the top family which has sent ravens but they did not reach the destination on time.

 Family Name Max Delays Targerian 487966

• Family and Network-wise most fails

FAMILY WITH LEAST FAILS

• Top family which has sent the ravens with minimum delay with respect to time.

 Family Name Max Delays Lannister 212823
• Family and Network-wise least fails

FAMILY MEMBERS WITH MOST FAILS

• In this we found the top members of a particular family across all the regions which has sent ravens but they did not reach the destination on time.

 MEMBER_NAME Most Delays Petyr Baelish 2742866 Deanery 2380853 Theon 2073554 Maester Aemon 2065623 Maester Kerwin 1374452 Eddard 1303839 Aeron 1255917 Robb 1140542 Rheagar 1119002 Viserys 1114054

• Family member and Network-wise most fails

 MEMBER_NAME NETWORK MostDelays Petyr Baelish 2g 1335556 Petyr Baelish 4g 1269052 Deanerys 2g 1226143 Maester Aemon 2g 1042310 Theon 2g 1038493 Deanerys 4g 1021780 Maester Aemon 4g 907730 Theon 4g 900776 Maester Kerwin 2g 687469 Eddard 2g 653142

FAMILY MEMBER WITH LEAST FAIL

• In this we found the top family member which has sent ravens and they reach the destination with least fail.

 MEMBER_NAME LEAST_DELAYS Jamie 761604 Aegon 756639 Kevan 745294 Tywin 739867 Yara/Asha 704214 Benjen 633369 Lancel 597992 Joanna 581775 Sansa 559570 Euron 491209

• Family member and Network-wise least fails:

In this we are considering the family member and the network (2G, 3G, 4G)

 MEMBER_NAME NETWORK LEAST_DELAYS Aerys – The mad king 3g 38746 Yara/Asha 3g 38577 Benjen 3g 38329 Aegon 3g 36563 Sansa 3g 36132 Maester Pyelle 3g 35122 Joanna 3g 31529 Lancel 3g 29921 Tywin 3g 29137 Euron 3g 25487

1. Modelling

In this we have considered two models based on the time series analysis:

• Failure Rate of the Raven for next four days

• For the entire dataset considering all the columns we have predicted the failure rate for upcoming four days.

The model that we are using in order to analyse the failure rate of ravens is the Auto- regressive Integrated Moving Average (ARIMA) Model. This helps us to forecast the parameters that led to the delay or failure rate of the ravens.

While preparing both the models, we considered original DATETIME column as separating date and time for the column resulted in loss of information.

We checked residuals and AIC value to predict the best possible models. Model with least AIC value resulted in best ARIMA model.

The reason we have opted for this model because ARIMA models provide another approach to time series forecasting. ARIMA models is the most widely used approach to time series forecasting, and provides complementary approaches to the problem. ARIMA models aim to describe the autocorrelations in the data.

There are 7848 unique ravens, we can build ARIMA model for each of them so that

Failure information can be forecasted for each of them.

1. Evaluation

We have used the time series model for the forecasting the delays of ravens.  This prediction will help us to make the raven fulfil their tasks in a proper and more efficient manner.

With the help of this analysis we were able to understand the trend of the disruptions between the Raven communications. The conclusion is mentioned below:

• Starting from 8AM till 11 PM we have facing a major hit on our communication through the Ravens.
• Maximum downward trend has been observed on 23rd

By this we can analyse the downward trend which can help us in analysing the major reasons for this failure. With proper study we can try to reduce the major factors which are taking this trend to reach the maximum peak and giving us the hit that we need to avoid as maximum as possible.

1. 1