The Telenor Case – What do Game of Thrones and Telecoms Have in Common?

The Telenor Case – What do Game of Thrones and Telecoms Have in Common?

**The Telenor Case**. This dataset is of failure of mobile data for one month. The dataset contains the information about RAVENS communication and by which network (2G/3G/4G) they got sharing of information. Also, who initiated the communication – family and member name (Example: Lannister,Tyrion). It also contains different types of failures means different types of delays.

**1. BUSINESS UNDERSTANDING:**

**2. DATA UNDERSTANDING:**

The data understanding starts with an initial data analyzing of data in order to get familiar with the content in it, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hidden information and also predicting the information which is required.

CODE FOR IMPORTING THE DATA INTO PYTHON:telenor = pd.read_csv("data.csv",delimiter=';') telenor telenor.info()OUTPUT:<class 'pandas.core.frame.DataFrame'> RangeIndex: 30091754 entries, 0 to 30091753 Data columns (total 16 columns): DATETIME object RAVEN_NAME object FAMILY_NAME object MEMBER_NAME object NETWORK object FIRST_GET_RESPONSE_SUCCESS_D int64 PAGE_BROWSING_DELAY int64 TCP_SETUP_TOTAL_DELAY int64 PAGE_CONTENT_DOWNLOAD_TOTAL_D int64 FIRST_DNS_RESPONSE_SUCCESS_D int64 DNS_RESPONSE_SUCCESS_DELAY int64 FIRST_TCP_RESPONSE_SUCCESS_D int64 PAGE_SR_DELAYS int64 SYN_SYN_DELAY int64 TCP_CONNECT_DELAY int64 PAGE_BROWSING_DELAYS int64 dtypes: int64(11), object(5) memory usage: 3.6+ GBTO ANALYSE THE DIFFERENT COLUMNS IN WHICH THE DATASET CONTAINS:telenor.columns

**3. DATA ANALYSIS:**

**7847**unique

**RAVEN NAMES**. Different networks are

**2G,3G,4G**.

**5**unique

**family names**are

**Targerian, Greyjoy, Stark, Lannister, Baelish**and there are

**28**unique

**members names.**The

**DATETIME**ranges from

**26/07/2018 – 05/08/2018.**

**4. EVALUATION:**

**PROBLEM – (i) Top 10 ravens with fails :**

**SOLUTION :**

```
top_10_raven_fails = top_10_raven_fails.rename(columns = {'DATETIME' :'FAILURES'})
top_10_raven_fails.sort_values('FAILURES',ascending=[False]).head(10)
```

**EXPLANATION**

**:**From this problem we found the top_10 RAVENS having failures and not having failures and this problem has been solved by dividing the data into two categories as telenor_most_delay and telenor_least_delay.

**PROBLEM – (ii) Top 10 ravens with out fails :**

**SOLUTION :**

```
top_10_raven_no_fails = telenor_least_delays[['RAVEN_NAME','DATETIME']].groupby('RAVEN_NAME').count()
top_10_raven_no_fails = top_10_raven_no_fails.rename(columns={'DATETIME':'FAILURES'})
top_10_raven_no_fails.sort_values('FAILURES',ascending=[False]).head(10)
```

**EXPLANATION :**In order to find the top_10 ravens without fails we have grouped the data by RAVEN_NAME and found the FAILURES of delay for each RAVEN_NAME from telenor_least_delays category and lastly we have sorted the values and found the top_10.

**PROBLEM – (iii) The family with most fails :**

**SOLUTION:**

```
top_10_family_most_fails = telenor_most_delays[['FAMILY_NAME','DATETIME']].groupby('FAMILY_NAME').count()
top_10_family_most_fails = top_10_family_most_fails.rename(columns={'DATETIME':'FAILURES'})
top_10_family_most_fails.sort_values('FAILURES',ascending=[False]).head(1)
```

**PROBLEM – (iv)The family with least fails:**

**SOLUTION:**

```
top_10_family_least_fails = top_10_family_least_fails.rename(columns={'DATETIME':'FAILURES'})
top_10_family_least_fails.sort_values('FAILURES',ascending=[False]).head(1)
```

**EXPLANATION :**In order to find the top family with least fails we have grouped the data by FAMILY_NAME and found the FAILURES of delay for each FAMILY_NAME from telenor_most_delays category and lastly we have sorted the values and found the top family.

**PROBLEM : (v) The family member with most fails :**

**SOLUTION:**

```
top_10_members_most_fails = telenor_most_delays[['MEMBER_NAME','DATETIME']].groupby('MEMBER_NAME').count()
top_10_members_most_fails = top_10_members_most_fails.rename(columns={'DATETIME':'FAILURES'})
top_10_members_most_fails.sort_values('FAILURES',ascending=[False]).head(1)
```

**EXPLANATION :**In order to find the top family member with most fails we have grouped the data by MEMBER_NAME and found the FAILURES of delay for each MEMBER_NAME from telenor_most_delays category and lastly we have sorted the values and found the top_10.

**PROBLEM – (vi) The family member with least fails :**

**SOLUTION:**

```
top_10_members_least_fails = telenor_most_delays[['MEMBER_NAME','DATETIME']].groupby('MEMBER_NAME').count()
top_10_members_least_fails = top_10_members_least_fails.rename(columns={'DATETIME':'FAILURES'})
top_10_members_least_fails.sort_values('FAILURES',ascending=[False]).head(1)
```

**EXPLANATION :**In order to find the top family member with least fails we have grouped the data by MEMBER_NAME and found the FAILURES of delay for each MEMBER_NAME from telenor_most_delays category and lastly we have sorted the values and found the top family member.

**5. MODELLING – TIME SERIES ANALYSIS:**

**TIME SERIES ANALYSIS USING ARIMA:**

**Introduction to ARIMA:**

**SOLUTION:**

fit1 <- auto.arima(data_new$total_failures)

forecast(fit1, 4)

summary(fit1)

plot(forecast(fit1)

**TIME SERIES ANALYSIS USING SIMPLE EXPONENTIAL METHOD:**

**INTRODUCTION TO SIMPLE EXPONENTIAL METHOD:**

*k*values, exponential smoothing allows for weighted averages where greater weight can be placed on recent observations and lesser weight on older observations. Exponential smoothing methods are intuitive, computationally efficient, and generally applicable to a wide range of time series.

**SOLUTION:**

fit2 <- holt(data_new$total_failures)

accuracy(fit2)

“`

Training set 503.3918 15844.26 12202.75 0.02569473 1.258304 0.8891327 0.3502734

**PREDICTION OF FUTURE FOUR VALUES BY USING TIME SERIES MODEL:**

forecast(fit2, 4)

plot(forecast(fit2, 4))

“`

**TIME SERIES ANALYSIS USING RECURRENT NEURAL NETWORK:**

**INTRODUCTION TO RECURRENT NEURAL NETWORK:**

**SOLUTION:**

fit5 <- nnetar(data_new$total_failures)

plot(forecast(fit5,h=4))

“`

**6. DEPLOYMENT : PREDICTING THE BEST TIME SERIES MODEL :**

**ROOT MEAN SQUARE ERROR FOR ARIMA**##########################

a1 = 975227-974967

a2 = 970960-971685

a3 = 958250-971685

a4 = 946177-971685

arima = (a1**2+a2**2+a3**2+a4**2)/4

arima

“`

**207937629**

#################################

**ROOT MEAN SQUARE ERROR FOR THE RNN**################################

r1 = 975227-972724

r2 = 970960-966955

r3 = 958250-965626

r4 = 946177-965540

rnn = (r1**2+r2**2+r3**2+r4**2)/4

rnn

“`

**112909045**

“`{r}

######################## **ROOT MEAN SQUARE ERROR FOR THE SIMPLE EXPONENTIAL** ############################

e1 = 975227-973776

e2 = 970960-973725

e3 = 958250-973674

e4 = 946177-973623

exponential =(e1**2+e2**2+e3**2+e3**2)/4

exponential

“`

OUTPUT VALUE : **121387545**

In order to predict the next four future values we have fitted the three different models. We have derived dataset named as fail_data in which it contains DATE,TIME and FAILURES. For predicting the data we have derived the predict_data from the TELENOR data and applied all the three models to predict the future four values. The predict_data consists of DATE and FAILURES. From these predict_data and fail_data we have calculated the RMSE values and also next four future values for the dates of **06/08/2018, 07/08/2018,08/08/2018,09/08/2018.**

(i) The future four values of ARIMA model are **974967,971685,971685,971685**.

(ii) The future four values of SIMPLE EXPONENTIAL model are **973776,973725,973674,973623.**

(iii) The future four values of RECURRENT NEURAL NETWORK model are **972724,966955,965626,965540.**

After predicting the next four future values we have found the Root mean square error values for each model and also plotted the graphs for each model. The RMSE (Root mean square error) values for each model are as follows:

(i) The RMSE (Root mean square error) value for ARIMA model is **207937629.**

(ii) The RMSE (Root mean square error) value for SIMPLE EXPONENTIAL model is **121387545****.**

(iii) The RMSE (Root mean square error) value for RECURRENT NEURAL NETWORK model is **112909045.**

**7. CONCLUSION:** Finally we conclude that by considering the Root mean square error for these algorithms, we got RNN (Recurrent Neural Networks) as the best algorithm to predict the future for days. because it has the lowest Root mean square error when compared to all the other two models. So, based on the RNN algorithm we have predicted the delays for the next four days based on given TELENORdataset.

**FINAL SUBMISSION – Includes the PYTHON AND R MARK DOWN CODES FOR PROBLEM SOLVING , ****ANALYZING**** AND PREDICTING THE TIME SERIES MODELS. **

## 3 thoughts on “Datathon Telenor Solution – WILDLINGS ANALYSIS ON TELENOR – GAME OF THRONES”

Great work, guys!

I like how you approached it using different methods. What approaches would you recommend to remove seasonality from time-series?

Thanks for all your appreciations jury members. The approaches which we would recommend to remove seasonality from time-series is Seasonal ARIMA( Auto regressive Integrated Moving Average models . Seasonal difference is a crude form of additive seasonal adjustment: the “index” which is subtracted from each value of the time series is simply the value that was observed in the same season for one year. Seasonal Autoregressive Integrated Moving Average (SARIMA) models can satisfactorily describe time series that exhibit non-stationary behaviors both within and across seasons.

Finally we conclude that we RNN is the best time series model for this dataset because we got the accurate result and it can be used for deep learning analysis.