Datathon Sofia Air Solution – Air station measurement bias correction using Pearson correlation coefficient

Posted 5 CommentsPosted in Datathons Solutions

This article aims to improve the estimation of the measured PM10 pollutants. In Sofia, there are several air pollution measurement stations. They measure PM10 particles, which are particles found in the air with a diameter between 2.5 and 10 micrometers.

The measurement stations fall into two categories, official stations and citizen stations. The official stations provide reliable measurements, they are better monitored and documented. The down-side is that they are only 5 and they are all concentrated in a single region. The citizen stations represent devices mounted on people homes or properties which measure PM10 particles. There is a whole network of such devices. They are many in number and provide a good coverage of the city. The problem with those measurements is that they are biased because of many local factors. Therefore the measurements form the citizen stations are not as reliable as those from the official stations, but on the up-side they are many in numbers.

In this article we define a method to reduce the bias of the measurements from the citizen stations.

Sofia Air Quality EDA (Exploratory Data Analysis)

Posted Leave a commentPosted in Datathons Solutions

Hi Everybody, I have done a Python Jupyter Notebook with some data explorations and maps. Feel free to take a look and comment: In the repo you will see a couple of HTMLs of the maps in case you are not able to re-run the code (if you don’t use Python). You can download the […]

Monthly Challenge – Sofia Air – Solution – Jacob Avila

Posted Leave a commentPosted in Prediction systems

Preliminary Analisys Due to the objective focused on predicting air quality forecast for the next 24 hours per station, first step should be data understanding for citizen science air quality measurements to group it by station and summarize them by day. To complete this task for inspection and pre-processing in order to find missing data, outliers and […]

October Data Science Monthly Challenge

Posted Leave a commentPosted in Prediction systems

Why you should join the Data Science Monthly Challenge and what you can expect?

The Data Science Monthly Challenge provides an exceptional opportunity for participants to be involved in finding a solution to a real data science problem [] step by step. The proposed gradual approach towards advanced business problems will give participants a chance to familiarize themselves in depth with each of the important steps which should be considered during the development of an effective and high-quality data science projects.

And last but not least the monthly challenge is an excellent opportunity for data enthusiasts to prepare themselves for participation in the Global Datathon organized by the Data Science Society during which the time is constrained and there is a much higher level of competition. The acquired skills and deeper understanding during the monthly challenges will play a key role and serve as a competitive advantage of the teams in such large-scale events such as the Global Datathons. Nevertheless, the monthly challenge can also be inspiring for those with more competitive attitude because there will be voting for each article and peer-to-peer reviews and each week the best-voted articles in progress will be uploaded on the News section of the site. 

So, what are you waiting for? 🙂
Register now for the learning challenge before 15th of Oct at


Posted 3 CommentsPosted in Prediction systems

Cell phones have become a necessity for many people throughout the world. The ability to keep in touch with family, business associates, and access to email are only a few of the reasons for the increasing importance of cell phones. Today’s technically advanced cell phones are capable of not only receiving and placing phone calls, but storing data, taking pictures, and can even be used as walkie talkies, to name just a few of the available options.
Dataset, The Telenor Case – What do Game of Thrones and Telecoms Have in Common? contains the data of delays in networks (RAVENS). The delays of RAVENS are ranging from 26/07/2018 – 05/08/2018. Each RAVEN_NAME represents the Tower. There are 7847 unique RAVEN_NAMES for different networks like 2G/3G/4G. There are 5 unique families.
To provide optimum solution to business problems we are solving the problem in two steps (i) Data Analysis and coding in PYTHON and (ii) Time Series model building in R Studio.
In data analysis we have found the solutions for the problems and found the number of delays (failures) of RAVENS. We also found the Top_10 RAVENS with and without fails. We also detected the Family names and Member names with most and least fails in networks (failures).
The methods of prediction & forecasting of the problem is done by using Time Series model building. As the name suggests that it involves working on time (years, days, hours, minutes) based on data, to derive the hidden insights to make informed decision making. Time series models are very useful models when it is serially correlated data. Based on mobile data, to predict the four days we have divided the data into train and test .We have done Time series analysis by using Arima, Simple exponential analysis and Recurrent Neural networks (RNN).
Finally we conclude that by considering the Root mean square error for these algorithms, we got RNN (Recurrent Neural Networks) as the best algorithm to predict the future for days. Based on the RNN algorithm the prediction of delays for the next four days were analyzed. We have plotted the graphs based on the Time series model for all the algorithms.

Datathon Telenor Solution – Game of Prediction (GoP)

Posted 3 CommentsPosted in Datathons Solutions

The objective of this analysis is to find out the ravens that are not reaching the destination on time. This kind of analysis would help us to scrutinize and understand the towers(ravens) who would require our utmost attention, in order to improve the reasons which are playing a major role in the delays.
The data-set talks about the networks between the towers (ravens). The land based communication happens with the help of signals.
A cellular network or mobile network is a communication network where the last link is wireless. This wireless transmission is done by a tower which comprises of a transmitter and a receiver (for the wireless transmission). The channel provides transmission for both the data as well as Voice transmission.
Every cellular network has different set of frequencies, to avoid any kind of overlapping and interference. Despite of many precautions for maintaining the setup, there are few parameters that are still impacting the transmission. Few parameters can be classified as:
 Infrastructure
 Interference between the frequencies
 Climatic conditions
 External Factors (Predators etc.)
For this our first approach is to create a “Decision Model” which can help us to give value to our business and help in improving the communication.
****** The tools that we using in order to predict is ******
1. Visual Analysis using different plots
2. Usage of ARMA (Auto-regressive- Moving- Average- Model)
The usage of this Decision Model will help us in forecasting the failure rate for next 4-7 days in regards to the Ravens.

Datathon Telenor Solution – Exploratory Data & Predictive Analytics -Analogy of Game of Thrones With Telenor Telecommunications

Posted 1 CommentPosted in Datathons Solutions

->This datasets is regarding the time series analysis on the failure rate of RAVENS sending the messages from kings landing to the north.
It depicts the analogy of Telenor communication  and Game of Thrones.
-> Sending ravens is one of the most fundamental parameters in mobile communications engineering.
For land-based mobile communications, the received raven variation is primarily the result of multipath fading caused by obstacles such as buildings (or clutter) or terrain irregularities; the distance between link end points; predatory animals, and interference among multiple transmissions, for example wars.
This inevitable raven variation is the cause of communication dropping, one of the most significant quality of service measure in operative communication. For this reason, various techniques and schemes are employed in the planning, design and optimization of raven networks to combat these propagation effects.
This normally covers the network physical configuration which include all aspects of network infrastructure deployment such as locations of base nests; additional food; sometimes guards, etc.
A typical example of these schemes and techniques is the use of models for flight prediction based on measured data.
Based on one month data with flight fails, the participants have to make time-series analysis and predict the future amount of fails.

Datathon Air Sofia Solution – Team Teljapenosss

Posted 3 CommentsPosted in Prediction systems

— Team Teljapenosss Team Members — Jalapeno (Nasiba Zokirova) Team Mentor: petya-par   Business Understanding The levels of air pollution allegedly caused by solid fuel heating and motor vehicle traffic are ever growing in the City of Sofia. The primary economical impact for the City of Sofia was a ruling by the European Court of […]