1. Business Problem Formulation The current political landscape is shaped by extreme polarization of opinions and by the proliferation of fake news. For example, a recent study published in Science has found that rumors and fake news tend to spread six times faster than truthful information. This situation both damages the reputation of respectable news outlets and […]
Kaufland-Case 1. Business Understanding Industrial vibration analysis is a measurement tool used to identify, predict, and prevent failures. Implementing vibration analysis on the machines will improve the reliability of the machines and lead to better machine efficiency and reduced down time eliminating mechanical or electrical failures. Vibration analysis are used to identify faults in machinery, plan machinery […]
In : import s3fs import pandas as pd import matplotlib.pyplot as plt import matplotlib.dates as mdates import seaborn as sns import numpy as np import pywt In : fs = s3fs.S3FileSystem(anon=True) fs.ls(‘datacases/datathon-2018-2/’) Out: [‘datacases/datathon-2018-2/kaufland’, ‘datacases/datathon-2018-2/nsi’, ‘datacases/datathon-2018-2/ontotext’, ‘datacases/datathon-2018-2/telelink’, ‘datacases/datathon-2018-2/telenor’] In : fs.ls(‘datacases/datathon-2018-2/kaufland’) Out: [‘datacases/datathon-2018-2/kaufland/20180820_Kaufland_case_IoT_and_predictive_maintenance_events.xlsx’, ‘datacases/datathon-2018-2/kaufland/20180920_Kaufland_case_IoT_and_predictive_maintenance.csv’, ‘datacases/datathon-2018-2/kaufland/sample_Kaufland_case_IoT_and_predictive_maintenance.csv’] Events¶ In : with fs.open(‘datacases/datathon-2018-2/kaufland/20180820_Kaufland_case_IoT_and_predictive_maintenance_events.xlsx’, ‘rb’) as f: df_events = pd.read_excel(f) In : df_events Out: […]
Predicting Houshold Budgets¶Authors: SoRd1, Jack, pr0faka, Kolio¶Team: Pigeons¶ Statistics is the painful elaboration of the obvious. Hello everyone 🙂 We all hope that you had a great time during the Datathon, because we did. We are working on the case from NSI – to predict the household expenditures per group for the years in which […]
The National Statistical Institute of Bulgaria (NSI) conducts annually a Household Budget Survey (HBS) with an objective to get reliable and scientifically founded data on the income, expenditure, consumption and other elements of the living standard of the population as well as changes, which have occurred during the years. NSI is considering a change in the periodicity of the Household Budget Survey from yearly to once on every five years,In order to optimize the cost of carrying out the survey. Hence We are creating a model which will predict household expenditure for the next four years using linear regression model and time series. The algorithms that we will be taking help from are linear regression model & Autoregressive integrated moving average(ARIMA). So lets not waste any time and move on with it !
Business Understanding In Sofia, air pollution norms were exceeded 70 times in the heating period from October 2017 to March 2018, citizens’ initiative AirBG.info says. The day with the worst air pollution in Sofia was January 27, when the norm was exceeded six times over. Things got so out of control that even the […]
Telelink Case Solution Team Dimas The Team Members – apetkov – desinik – rdimitrov – melania-berbatova – vrategov Github Repo: https://github.com/Bugzey/Team-Midas Workflow The main workflow happens over at our github page. You can read the latest version of this article here: https://github.com/Bugzey/Team-Midas/blob/master/7.%20Documentation/Doc_010%20Documentation.md ## Content 0. Data We were given the following 4 datasets: Air Tube-20180928T185037Z-001.zip […]
1. Business Understanding Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to […]
It is a very well known fact that Exploratory Data Analysis is cornerstone of Data Analysis.
On the analysis of data it is evident that Brass Raven Birdy as the most failed and the Metallic Raven Sunburst Polly is the most successful raven. Also Targeryan family has the most Raven fails whereas Baelish family has the least failures,and among the family of Baelish, Peter Baelish has the most failure rate and Euron has the least failures.
ARIMA model is used for predicting the number of failures for the next 4 days.
“You know nothing, Jon Snow ……”
is what Ygritte yells to Jon.
Here our situation is also the same as we know nothing about the TELENOR case until we have seen the dataset.
At first, When we heard about the Datathon as beginners, we were very excited to take apart in it.
At finally we received our datasets and here’s our first challenge to import the dataset into the programming platforms,
As we have faced some hurdles to import the dataset as the size of the dataset is around 4GB which has taken some time and put us in the situation :
What we don’t know is what usually gets us killed………………………– Petyr Baelish
we have mentioned above line to express our feeling that we don’t know what’s in the dataset but we want to explore through that.
At last, we are ready to Analysis What do Game of Thrones and Telecoms Have in Common?
At first, when we have gone through the dataset, We have noticed that the Telenor data contains 16 exciting columns with
30091754 jolted rows
When we have gone through the first analysis, we came to know that how complicated the data is, it contains many interesting aspects which we have done through the Exploratory Data Analysis.
Here our main challenge is to predict the fails in the next four days
At First, we have done Exploratory data analysis
(i) Top 10 ravens with fails :
Brass raven Birdy
Brown raven Ruby
Yellow raven Rio
Blue raven Axel
Razzle Dazzle Rose raven Cleo
Cadmium Red raven Bubba
Vain And Lazy raven Polly
Fearful Carrion raven Gizmo
Blast Off Bronze raven Zazu
Loving raven Maxwell
(ii) Top 10 ravens without fails:
Metallic Sunburst raven Polly
Green Sheen raven Azul
Less Combative raven Zazu
Weak raven Buddy
Copper raven Tweety
Spectral Yellow raven Zazu
Mythical raven Tiki
Cyber Grape raven Faith
Mysterious And Venerable raven Bubba
Shadow Blue raven Sammy
(iii) The family with most fails :
(iv) The family with least fails :
(v) The family member with most fails :
(vi) The family member with least fails :
After the EDA we need to predict the future four days of delays in mobile data connectivity. To predict the four days delays we use Time Series analysis.
In Time Series Analysis we used three algorithms ARIMA, Simple Exponential Analysis, Recurrent Neural Networks.
We fitted the model with ARIMA and predict the failures of four days and fitted the model using another algorithm Simple Exponential Analysis.
And We used Recurrent Neural Networks for Prediction of failures.
After Fitting the three models using three different algorithms we evaluated by splitting the data into train and test.
We evaluated the best fit model by using the Root Mean Square Error. By considering the RMSE values of the three models, the model with the least RMSE value is taken as the best fit model.
In this case, considering the mobile failure dataset, RNN(Recurrent Neural Network)has the least RMSE value.
So, RNN is taken as the best fit model to predict the future four days of mobile data delays.
Based on the RNN algorithm the prediction of delays for the next four days based on the dataset
are 973776,973725,973674,973623 for 5 ,6,7,8, August 2018 respectively.
Problem statement :This data set is regarding time series analysis on failure rate of ravens sending the messages from king’s landing to the north . This case study is an analogy on Telenor telecommunications and Game of Thrones . Due to the obstacles that caused the failure rate , various techniques and schemes are employed in the planning, design and optimization of raven networks to combat these propagation effects.
We have used R-studio for Exploratory Data Analysis.
As per the tasks given to us , we concluded that
1.Brass Raven Birdy has been delayed for the most number of times , followed by Brown raven ruby and Yellow raven Rio,
while Metallic Sunburst Raven Polly has been delayed for the least number of times , followed by Green Sheen raven Azul and Less combative raven zazu.
2. The family with most fails is Targerian , while with least fails is Lannister
3. The family Member with most fails is Petyr Baelish and with least fails is Euron .
We have done further analysis on predicting the fails for the next four days using TIME SERIES ANALYSIS