Hack the News Datathon Case – Propaganda Detection

Posted Leave a commentPosted in NLP

1. Business Problem Formulation The current political landscape is shaped by extreme polarization of opinions and by the proliferation of fake news. For example, a recent study published in Science has found that rumors and fake news tend to spread six times faster than truthful information. This situation both damages the reputation of respectable news outlets and […]

Datathon Kaufland Solution – Predictive Maintenance Based on Sensor Data for Forklifts

Posted 1 CommentPosted in Prediction systems

Kaufland-Case 1. Business Understanding Industrial vibration analysis is a measurement tool used to identify, predict, and prevent failures. Implementing vibration analysis on the machines will improve the reliability of the machines and lead to better machine efficiency and reduced down time eliminating mechanical or electrical failures. Vibration analysis are used to identify faults in machinery, plan machinery […]

Datathon Kaufland Solution – Kaufland case – Team3

Posted 1 CommentPosted in Datathons Solutions

In [1]: import s3fs import pandas as pd import matplotlib.pyplot as plt import matplotlib.dates as mdates import seaborn as sns import numpy as np import pywt In [2]: fs = s3fs.S3FileSystem(anon=True) fs.ls(‘datacases/datathon-2018-2/’) Out[2]: [‘datacases/datathon-2018-2/kaufland’, ‘datacases/datathon-2018-2/nsi’, ‘datacases/datathon-2018-2/ontotext’, ‘datacases/datathon-2018-2/telelink’, ‘datacases/datathon-2018-2/telenor’] In [3]: fs.ls(‘datacases/datathon-2018-2/kaufland’) Out[3]: [‘datacases/datathon-2018-2/kaufland/20180820_Kaufland_case_IoT_and_predictive_maintenance_events.xlsx’, ‘datacases/datathon-2018-2/kaufland/20180920_Kaufland_case_IoT_and_predictive_maintenance.csv’, ‘datacases/datathon-2018-2/kaufland/sample_Kaufland_case_IoT_and_predictive_maintenance.csv’] Events¶ In [4]: with fs.open(‘datacases/datathon-2018-2/kaufland/20180820_Kaufland_case_IoT_and_predictive_maintenance_events.xlsx’, ‘rb’) as f: df_events = pd.read_excel(f) In [5]: df_events Out[5]: […]

Datathon NSI Solution – Predicting Household Budgets

Posted 2 CommentsPosted in Datathons Solutions

Predicting Houshold Budgets¶Authors: SoRd1, Jack, pr0faka, Kolio¶Team: Pigeons¶ Statistics is the painful elaboration of the obvious. Hello everyone 🙂 We all hope that you had a great time during the Datathon, because we did. We are working on the case from NSI – to predict the household expenditures per group for the years in which […]

Datathon NSI Solution – The curious case of ‘Household Budget Survey(HBS)’

Posted 6 CommentsPosted in Prediction systems

The National Statistical Institute of Bulgaria (NSI) conducts annually a Household Budget Survey (HBS) with an objective to get reliable and scientifically founded data on the income, expenditure, consumption and other elements of the living standard of the population as well as changes, which have occurred during the years. NSI is considering a change in the periodicity of the Household Budget Survey from yearly to once on every five years,In order to optimize the cost of carrying out the survey. Hence We are creating a model which will predict household expenditure for the next four years using linear regression model and time series. The algorithms that we will be taking help from are linear regression model & Autoregressive integrated moving average(ARIMA). So lets not waste any time and move on with it !

Datathon Sofia Air Solution – Telelink Case Solution

Posted 5 CommentsPosted in Prediction systems

Telelink Case Solution Team Dimas The Team Members – apetkov – desinik – rdimitrov – melania-berbatova – vrategov Github Repo: https://github.com/Bugzey/Team-Midas Workflow The main workflow happens over at our github page. You can read the latest version of this article here: https://github.com/Bugzey/Team-Midas/blob/master/7.%20Documentation/Doc_010%20Documentation.md ## Content 0. Data We were given the following 4 datasets: Air Tube-20180928T185037Z-001.zip […]

Datathon Sofia Air Solution – The Telelink Case handled by the Urban air quality Gurus!

Posted 4 CommentsPosted in Datathons Solutions

  1. Business Understanding Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to […]


Posted 3 CommentsPosted in Prediction systems

Cell phones have become a necessity for many people throughout the world. The ability to keep in touch with family, business associates, and access to email are only a few of the reasons for the increasing importance of cell phones. Today’s technically advanced cell phones are capable of not only receiving and placing phone calls, but storing data, taking pictures, and can even be used as walkie talkies, to name just a few of the available options.
Dataset, The Telenor Case – What do Game of Thrones and Telecoms Have in Common? contains the data of delays in networks (RAVENS). The delays of RAVENS are ranging from 26/07/2018 – 05/08/2018. Each RAVEN_NAME represents the Tower. There are 7847 unique RAVEN_NAMES for different networks like 2G/3G/4G. There are 5 unique families.
To provide optimum solution to business problems we are solving the problem in two steps (i) Data Analysis and coding in PYTHON and (ii) Time Series model building in R Studio.
In data analysis we have found the solutions for the problems and found the number of delays (failures) of RAVENS. We also found the Top_10 RAVENS with and without fails. We also detected the Family names and Member names with most and least fails in networks (failures).
The methods of prediction & forecasting of the problem is done by using Time Series model building. As the name suggests that it involves working on time (years, days, hours, minutes) based on data, to derive the hidden insights to make informed decision making. Time series models are very useful models when it is serially correlated data. Based on mobile data, to predict the four days we have divided the data into train and test .We have done Time series analysis by using Arima, Simple exponential analysis and Recurrent Neural networks (RNN).
Finally we conclude that by considering the Root mean square error for these algorithms, we got RNN (Recurrent Neural Networks) as the best algorithm to predict the future for days. Based on the RNN algorithm the prediction of delays for the next four days were analyzed. We have plotted the graphs based on the Time series model for all the algorithms.

Datathon Telenor Solution – Ravens for Communication

Posted 1 CommentPosted in Datathons Solutions

It is a very well known fact that Exploratory Data Analysis is cornerstone of Data Analysis.
On the analysis of data it is evident that Brass Raven Birdy as the most failed and the Metallic Raven Sunburst Polly is the most successful raven. Also Targeryan family has the most Raven fails whereas Baelish family has the least failures,and among the family of Baelish, Peter Baelish has the most failure rate and Euron has the least failures.
ARIMA model is used for predicting the number of failures for the next 4 days.