Objective:
- The objective of this analysis is to make the predictions for upcoming 4 days based on the functioning of the ravens that are not reaching the destination on time.
- This kind of exploration would help us to scrutinize and apprehend the towers who would require our consideration, in order to demonstrate the reasons which are playing a major role in the interruptions.
Let’s start by understanding the dataset.
Introduction:
The dataset talks about the communication between the towers (ravens). The land based communication happens with the help of signals. A cellular network or mobile network is a communication network where the last link is wireless. This wireless transmission is done by a tower which comprises of a transmitter and a receiver (for the wireless transmission). The channel provides transmission for both data as well as Voice transmission.
Every cellular network has different set of frequencies, to avoid any kind of overlapping and interference.
Despite of many precautions, for maintaining the setup, there are few parameters that are still impacting the transmission. Those parameters can be classified as:
- Infrastructure
- Interference between the different frequencies
- Climatic conditions
- External Factors (Predators etc.)
- Business Understanding
In this we need to analyse the dataset provided to us and derive a “Decision Model”. Before starting to move in the direction of Business understanding let’s get a good hold on “Decision Model”.
Managing the basic Business logic, which can lead to visible conclusions and outcomes. The raw input given to the decision model should contain all the facts and the logics on which our Decision model will work. Ultimately, it is the logic that derives the entire process. This logic gives the correct shape and leads to the conclusion in order to provide two things:
- Meaning to the Decision Model
- Value to the Business
After understanding the “Decision Model” let’s know that why on the first page we require that concept here in analysing.
The major reason that led us towards the direction of “Decision Model” is the requirement to understand the background of the dataset.
Background
This data is derived from the total number of times the raven was sent for communication purpose but was not able to reach the destination on time. Hence, there was a quick need of a perfect yet effective plan to end up these communication crisis.
- Data Understanding
Under this we would be covering the below mentioned points:
- Initial Data Collection: Data is for about a month starting from 6th July’2018 – 4th Aug’2018.
- Familiar with Data: The dataset consists of columns which describes the delay due to which the communication failed or was delayed, that require our utmost attention.
Understanding of the Columns in the Dataset:
- “DATETIME”: This column records the date and time of each transmission.
- “Raven Name”: This column records from where the transmission of communication is taking place. The data passes through many phrases, such as obtaining the IP address of the website name, Domain Name System. After this the connection request from sender is sent to the receiver. When acceptance acknowledgement is sent from the server to the client connection is successfully setup. Delays an happen in this process.
We have divided the delays in three categories such as:
- DNS Delay:
The DNS delay mainly concerns with local internet service provider (ISP). The ISP catches the DNS record queried through the server. This delay from ISP will prevent us from viewing the website. This is known as DNS propagation delay. We have two columns for the delays:
- “FIRST_DNS_RESPONSE_SUCCESS_D”:
- “DNS_RESPONSE_SUCCESS_DELAY”:
- Connection Delay:
The Delay in establishing the connection. The maximum number of time a TCP Packet is retransmitted in established state before giving up. The default value is 15, which corresponds to a duration of 13-30 minutes, depending upon the retransmission time out
- “TCP_SETUP_TOTAL_DELAY”
- “FIRST_TCP_RESPONSE_SUCCESS_D”
- “TCP_CONNECT_DELAY”
- “SYN_SYN_DELAY”
- Page Delay:
- “FIRST_GET_RESPONSE_SUCCESS_D”
- “PAGE_CONTENT_DOWNLOAD_TOTAL_D”
- “PAGE_SR_DELAYS”
- “PAGE_BROWSING_DELAYS”
- “PAGE_BROWSING_DELAY”
- SR=> Selective response
Selective repeat attempts to retransmit only those
Packets are lost (due to errors).
- Data Preparation
We have filtered the data according to the columns that are mentioned in the dataset.
The data set contains approximately 10,819 rows that have the value as “Zero”. We have to exclude those rows in order to get the appropriate values for our analysis and forecast. Moreover, the dataset has one column that has time and date included in them. Using that column, we have extracted the date and the time which we would be using in the time series analysis ahead. The data that we are using is already clean and doesn’t require more cleaning.
- Modelling
The model that we are using in order to analyse the failure rate of ravens is the Auto- regressive Integrated Moving Average (ARIMA) Model. This helps us to forecast the parameters that led to the delay or failure rate of the ravens.
The reason we have opted for this model because ARIMA models provide another approach to time series forecasting. ARIMA models is the most widely used approach to time series forecasting, and provides complementary approaches to the problem. ARIMA models aim to describe the autocorrelations in the data.
- Data Analysis
In this section our major focus is on analysing the data provide to us of the basic business prospective. Before we proceed any further we need to get clear with the basic set of questions which can be analysed by this data.
TOP TEN RAVEN WITH FAILS
- In this we found the top ten raven that have taken maximum time to reach their destination with respect to time. We have taken those delay columns in consideration in which the columns have some values.
Please find below the result:
Raven Name | Total Delays |
Sole Musical raven Azul | 41114 |
Withered raven Mo | 39447 |
Biggest Strongest raven Wilbur | 36498 |
Loving raven Maxwell | 31787 |
Cadmium Red raven Destiny | 30820 |
Beautiful And Saucy raven Boo-boo | 29989 |
Big Dip O’ Ruby raven Bibi | 28054 |
Bittersweet Shimmer raven Chip | 26605 |
Purple raven Phoenix | 26024 |
Small Gregarious raven Paco | 24380 |
TOP TEN RAVEN WITHOUT FAILS
- In this we found the top ten raven that have reached their destination in time. We have taken those delay columns in consideration in which the columns values are zero.s
Please find below the result:
Raven Name | Total No Delays |
Metallic Sunburst raven Polly | 297 |
Green Sheen raven Azul | 211 |
Less Combative raven Zazu | 191 |
Weak raven Buddy | 188 |
Copper raven Tweety | 179 |
Spectral Yellow raven Zazu | 1481 |
Mythical raven Tiki | 116 |
Cyber Grape raven Faith | 104 |
Mysterious And Venerable raven Bubba | 98 |
Shadow Blue raven Sammy | 95 |
FAMILY WITH MOST FAILS
- In this we found the top family which has sent ravens but they did not reach the destination on time.
Please find below the result:
Family Name | Max Delays |
Targerian | 487966 |
- Family and Network-wise most fails
FAMILY WITH LEAST FAILS
- Top family which has sent the ravens with minimum delay with respect to time.
Please find below the result:
Family Name | Max Delays |
Lannister | 212823 |
- Family and Network-wise least fails
FAMILY MEMBERS WITH MOST FAILS
- In this we found the top members of a particular family across all the regions which has sent ravens but they did not reach the destination on time.
Please find below the result:
MEMBER_NAME | Most Delays |
Petyr Baelish | 2742866 |
Deanery | 2380853 |
Theon | 2073554 |
Maester Aemon | 2065623 |
Maester Kerwin | 1374452 |
Eddard | 1303839 |
Aeron | 1255917 |
Robb | 1140542 |
Rheagar | 1119002 |
Viserys | 1114054
|
- Family member and Network-wise most fails
Please find below the result:
MEMBER_NAME
|
NETWORK
|
MostDelays
|
Petyr Baelish | 2g | 1335556 |
Petyr Baelish | 4g | 1269052 |
Deanerys | 2g | 1226143 |
Maester Aemon | 2g | 1042310 |
Theon | 2g | 1038493 |
Deanerys | 4g | 1021780 |
Maester Aemon | 4g | 907730 |
Theon | 4g | 900776 |
Maester Kerwin | 2g | 687469 |
Eddard | 2g | 653142 |
FAMILY MEMBER WITH LEAST FAIL
- In this we found the top family member which has sent ravens and they reach the destination with least fail.
Please find below the result:
MEMBER_NAME | LEAST_DELAYS |
Jamie | 761604 |
Aegon | 756639 |
Kevan | 745294 |
Tywin | 739867 |
Yara/Asha | 704214 |
Benjen | 633369 |
Lancel | 597992 |
Joanna | 581775 |
Sansa | 559570 |
Euron | 491209 |
- Family member and Network-wise least fails:
In this we are considering the family member and the network (2G, 3G, 4G)
Please find below the result:
MEMBER_NAME | NETWORK | LEAST_DELAYS |
Aerys – The mad king | 3g | 38746 |
Yara/Asha | 3g | 38577 |
Benjen | 3g | 38329 |
Aegon | 3g | 36563 |
Sansa | 3g | 36132 |
Maester Pyelle | 3g | 35122 |
Joanna | 3g | 31529 |
Lancel | 3g | 29921 |
Tywin | 3g | 29137 |
Euron | 3g | 25487 |
- Modelling
In this we have considered two models based on the time series analysis:
- Failure Rate of the Raven for next four days
- For the entire dataset considering all the columns we have predicted the failure rate for upcoming four days.
The model that we are using in order to analyse the failure rate of ravens is the Auto- regressive Integrated Moving Average (ARIMA) Model. This helps us to forecast the parameters that led to the delay or failure rate of the ravens.
While preparing both the models, we considered original DATETIME column as separating date and time for the column resulted in loss of information.
We checked residuals and AIC value to predict the best possible models. Model with least AIC value resulted in best ARIMA model.
The reason we have opted for this model because ARIMA models provide another approach to time series forecasting. ARIMA models is the most widely used approach to time series forecasting, and provides complementary approaches to the problem. ARIMA models aim to describe the autocorrelations in the data.
There are 7848 unique ravens, we can build ARIMA model for each of them so that
Failure information can be forecasted for each of them.
- Evaluation
We have used the time series model for the forecasting the delays of ravens. This prediction will help us to make the raven fulfil their tasks in a proper and more efficient manner.
With the help of this analysis we were able to understand the trend of the disruptions between the Raven communications. The conclusion is mentioned below:
- Starting from 8AM till 11 PM we have facing a major hit on our communication through the Ravens.
- Maximum downward trend has been observed on 23rd
By this we can analyse the downward trend which can help us in analysing the major reasons for this failure. With proper study we can try to reduce the major factors which are taking this trend to reach the maximum peak and giving us the hit that we need to avoid as maximum as possible.
3 thoughts on “Datathon Telenor Solution – Game of Prediction (GoP)”
Team, please do provide process how you prepared data, build and evaluated model(s) for prediction. For any “scientific” article and for anybody who has access to data, whole process should be repetable, i.e. anyone should be able to take your code/work and get same end results. Also, focus of this case is “The main task is to predict the fails in the next four days (on both files).” which is clearly stated in case description, so please focus on adding this part of information into your article (more than just sentence “Model is giving better prediction for 1 day”).
Good desctiption of the data understending. Some recommendations – In the TOP Ravens tasks was good to describe how you have grouped the delays. Also will be good to have some more info about why is ARIMA better in this case then other methods, and maybe to add some results from others also in the article.
At beginning of document you attached zip file with code which may be overlooked. I would suggest moving it to end of document to become more visible. But overall, nice exploration and clean process