Datathon Sofia Air Solution – Air station measurement bias correction using Pearson correlation coefficient

Posted 5 CommentsPosted in Datathons Solutions

This article aims to improve the estimation of the measured PM10 pollutants. In Sofia, there are several air pollution measurement stations. They measure PM10 particles, which are particles found in the air with a diameter between 2.5 and 10 micrometers.

The measurement stations fall into two categories, official stations and citizen stations. The official stations provide reliable measurements, they are better monitored and documented. The down-side is that they are only 5 and they are all concentrated in a single region. The citizen stations represent devices mounted on people homes or properties which measure PM10 particles. There is a whole network of such devices. They are many in number and provide a good coverage of the city. The problem with those measurements is that they are biased because of many local factors. Therefore the measurements form the citizen stations are not as reliable as those from the official stations, but on the up-side they are many in numbers.

In this article we define a method to reduce the bias of the measurements from the citizen stations.

Datathon Kaufland Solution – Predictive Maintenance Based on Sensor Data for Forklifts

Posted 2 CommentsPosted in Prediction systems

Kaufland-Case 1. Business Understanding Industrial vibration analysis is a measurement tool used to identify, predict, and prevent failures. Implementing vibration analysis on the machines will improve the reliability of the machines and lead to better machine efficiency and reduced down time eliminating mechanical or electrical failures. Vibration analysis are used to identify faults in machinery, plan machinery […]

Datathon NSI Solution – The curious case of ‘Household Budget Survey(HBS)’

Posted 6 CommentsPosted in Prediction systems

The National Statistical Institute of Bulgaria (NSI) conducts annually a Household Budget Survey (HBS) with an objective to get reliable and scientifically founded data on the income, expenditure, consumption and other elements of the living standard of the population as well as changes, which have occurred during the years. NSI is considering a change in the periodicity of the Household Budget Survey from yearly to once on every five years,In order to optimize the cost of carrying out the survey. Hence We are creating a model which will predict household expenditure for the next four years using linear regression model and time series. The algorithms that we will be taking help from are linear regression model & Autoregressive integrated moving average(ARIMA). So lets not waste any time and move on with it !

Datathon Sofia Air Solution – The Telelink Case handled by the Urban air quality Gurus!

Posted 4 CommentsPosted in Datathons Solutions

  1. Business Understanding Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to […]

Datathon Sofia Air Solution – Telelink Case Solution

Posted 5 CommentsPosted in Prediction systems

Telelink Case Solution Team Dimas The Team Members – apetkov – desinik – rdimitrov – melania-berbatova – vrategov Github Repo: Workflow The main workflow happens over at our github page. You can read the latest version of this article here: ## Content 0. Data We were given the following 4 datasets: Air […]

Datathon Telenor Solution – Ravens for Communication

Posted 1 CommentPosted in Datathons Solutions

It is a very well known fact that Exploratory Data Analysis is cornerstone of Data Analysis.
On the analysis of data it is evident that Brass Raven Birdy as the most failed and the Metallic Raven Sunburst Polly is the most successful raven. Also Targeryan family has the most Raven fails whereas Baelish family has the least failures,and among the family of Baelish, Peter Baelish has the most failure rate and Euron has the least failures.
ARIMA model is used for predicting the number of failures for the next 4 days.

Datathon Telenor Solution – Analysing and Predicting Delays in Mobile Data Connectivity

Posted 9 CommentsPosted in Prediction systems

“You know nothing, Jon Snow ……”
is what Ygritte yells to Jon.

Here our situation is also the same as we know nothing about the TELENOR case until we have seen the dataset.

At first, When we heard about the Datathon as beginners, we were very excited to take apart in it.
At finally we received our datasets and here’s our first challenge to import the dataset into the programming platforms,
As we have faced some hurdles to import the dataset as the size of the dataset is around 4GB which has taken some time and put us in the situation :

What we don’t know is what usually gets us killed………………………– Petyr Baelish

we have mentioned above line to express our feeling that we don’t know what’s in the dataset but we want to explore through that.
At last, we are ready to Analysis What do Game of Thrones and Telecoms Have in Common?

At first, when we have gone through the dataset, We have noticed that the Telenor data contains 16 exciting columns with
30091754 jolted rows

When we have gone through the first analysis, we came to know that how complicated the data is, it contains many interesting aspects which we have done through the Exploratory Data Analysis.

Here our main challenge is to predict the fails in the next four days
At First, we have done Exploratory data analysis
(i) Top 10 ravens with fails :
Brass raven Birdy
Brown raven Ruby
Yellow raven Rio
Blue raven Axel
Razzle Dazzle Rose raven Cleo
Cadmium Red raven Bubba
Vain And Lazy raven Polly
Fearful Carrion raven Gizmo
Blast Off Bronze raven Zazu
Loving raven Maxwell

(ii) Top 10 ravens without fails:

Metallic Sunburst raven Polly
Green Sheen raven Azul
Less Combative raven Zazu
Weak raven Buddy
Copper raven Tweety
Spectral Yellow raven Zazu
Mythical raven Tiki
Cyber Grape raven Faith
Mysterious And Venerable raven Bubba
Shadow Blue raven Sammy

(iii) The family with most fails :


(iv) The family with least fails :


(v) The family member with most fails :

Petyr Baelish

(vi) The family member with least fails :


After the EDA we need to predict the future four days of delays in mobile data connectivity. To predict the four days delays we use Time Series analysis.

In Time Series Analysis we used three algorithms ARIMA, Simple Exponential Analysis, Recurrent Neural Networks.

We fitted the model with ARIMA and predict the failures of four days and fitted the model using another algorithm Simple Exponential Analysis.

And We used Recurrent Neural Networks for Prediction of failures.

After Fitting the three models using three different algorithms we evaluated by splitting the data into train and test.

We evaluated the best fit model by using the Root Mean Square Error. By considering the RMSE values of the three models, the model with the least RMSE value is taken as the best fit model.

In this case, considering the mobile failure dataset, RNN(Recurrent Neural Network)has the least RMSE value.

So, RNN is taken as the best fit model to predict the future four days of mobile data delays.

Based on the RNN algorithm the prediction of delays for the next four days based on the dataset
are 973776,973725,973674,973623 for 5 ,6,7,8, August 2018 respectively.

Datathon Telenor Solution – Analysis Of Mobile Data Connectivity Delays

Posted Leave a commentPosted in Datathons Solutions

Problem statement :This data set is regarding time series analysis on failure rate of ravens sending the messages from king’s landing to the north . This case study is an analogy on Telenor telecommunications and Game of Thrones . Due to the obstacles that caused the failure rate , various techniques and schemes are employed in the planning, design and optimization of raven networks to combat these propagation effects.

We have used R-studio for Exploratory Data Analysis.
As per the tasks given to us , we concluded that
1.Brass Raven Birdy has been delayed for the most number of times , followed by Brown raven ruby and Yellow raven Rio,
while Metallic Sunburst Raven Polly has been delayed for the least number of times , followed by Green Sheen raven Azul and Less combative raven zazu.
2. The family with most fails is Targerian , while with least fails is Lannister
3. The family Member with most fails is Petyr Baelish and with least fails is Euron .
We have done further analysis on predicting the fails for the next four days using TIME SERIES ANALYSIS