This article describes our submission for the Hack the News Datathon 2019 which focuses on Task 2, Propaganda sentence classification. It outlines our exploratory data analysis, methodology and future work. Our work revolves around the BERT model as we believe it offers an excellent language model that’s also good at attending to context which is an important aspect of propaganda detection.
1. Business Problem Formulation The current political landscape is shaped by extreme polarization of opinions and by the proliferation of fake news. For example, a recent study published in Science has found that rumors and fake news tend to spread six times faster than truthful information. This situation both damages the reputation of respectable news outlets and […]
Kaufland-Case 1. Business Understanding Industrial vibration analysis is a measurement tool used to identify, predict, and prevent failures. Implementing vibration analysis on the machines will improve the reliability of the machines and lead to better machine efficiency and reduced down time eliminating mechanical or electrical failures. Vibration analysis are used to identify faults in machinery, plan machinery […]
In : import s3fs import pandas as pd import matplotlib.pyplot as plt import matplotlib.dates as mdates import seaborn as sns import numpy as np import pywt In : fs = s3fs.S3FileSystem(anon=True) fs.ls(‘datacases/datathon-2018-2/’) Out: [‘datacases/datathon-2018-2/kaufland’, ‘datacases/datathon-2018-2/nsi’, ‘datacases/datathon-2018-2/ontotext’, ‘datacases/datathon-2018-2/telelink’, ‘datacases/datathon-2018-2/telenor’] In : fs.ls(‘datacases/datathon-2018-2/kaufland’) Out: [‘datacases/datathon-2018-2/kaufland/20180820_Kaufland_case_IoT_and_predictive_maintenance_events.xlsx’, ‘datacases/datathon-2018-2/kaufland/20180920_Kaufland_case_IoT_and_predictive_maintenance.csv’, ‘datacases/datathon-2018-2/kaufland/sample_Kaufland_case_IoT_and_predictive_maintenance.csv’] Events¶ In : with fs.open(‘datacases/datathon-2018-2/kaufland/20180820_Kaufland_case_IoT_and_predictive_maintenance_events.xlsx’, ‘rb’) as f: df_events = pd.read_excel(f) In : df_events Out: […]
Predicting Houshold Budgets¶Authors: SoRd1, Jack, pr0faka, Kolio¶Team: Pigeons¶ Statistics is the painful elaboration of the obvious. Hello everyone 🙂 We all hope that you had a great time during the Datathon, because we did. We are working on the case from NSI – to predict the household expenditures per group for the years in which […]
The National Statistical Institute of Bulgaria (NSI) conducts annually a Household Budget Survey (HBS) with an objective to get reliable and scientifically founded data on the income, expenditure, consumption and other elements of the living standard of the population as well as changes, which have occurred during the years. NSI is considering a change in the periodicity of the Household Budget Survey from yearly to once on every five years,In order to optimize the cost of carrying out the survey. Hence We are creating a model which will predict household expenditure for the next four years using linear regression model and time series. The algorithms that we will be taking help from are linear regression model & Autoregressive integrated moving average(ARIMA). So lets not waste any time and move on with it !
Business Understanding In Sofia, air pollution norms were exceeded 70 times in the heating period from October 2017 to March 2018, citizens’ initiative AirBG.info says. The day with the worst air pollution in Sofia was January 27, when the norm was exceeded six times over. Things got so out of control that even the […]
Telelink Case Solution Team Dimas The Team Members – apetkov – desinik – rdimitrov – melania-berbatova – vrategov Github Repo: https://github.com/Bugzey/Team-Midas Workflow The main workflow happens over at our github page. You can read the latest version of this article here: https://github.com/Bugzey/Team-Midas/blob/master/7.%20Documentation/Doc_010%20Documentation.md ## Content 0. Data We were given the following 4 datasets: Air Tube-20180928T185037Z-001.zip […]
1. Business Understanding Particulate matter is considered the air pollutant of greatest concern to the health of the urban population. Researches have shown that exposure to PM can lead to increased days lost from work or school, emergency room visits, hospital stays, and deaths. Both short and long-term exposures to PM can lead to […]
It is a very well known fact that Exploratory Data Analysis is cornerstone of Data Analysis.
On the analysis of data it is evident that Brass Raven Birdy as the most failed and the Metallic Raven Sunburst Polly is the most successful raven. Also Targeryan family has the most Raven fails whereas Baelish family has the least failures,and among the family of Baelish, Peter Baelish has the most failure rate and Euron has the least failures.
ARIMA model is used for predicting the number of failures for the next 4 days.