Sofia Air Pollution Case Team BG-USA: Kristiyan Vachev – Bulgaria () Sergey Vichev – Bulgaria () Stefan Panev – Bulgaria Georgi Kirilov – Bulgaria Mike Lane – USA () Data Preparation Geocoding the construction data: The original source file can be found here. Basically, this is very very similar to geocoding as proposed in the original documentation […]
Sofia is a city with significant concentrations of particulate matter less than 10 micrometers in diameter (PM10.) A high concentration of PM10 is disruptive to life and the climate. The purpose of this project is to predict the concentration of PM10 at a particular day given the climatic conditions. This is important in allowing the making of policies to reduce the pollution in the city. Our contribution consists of a random forest regressor that acheives the purpose with 70 to 80% accuracy.
Introduction to NLP
Natural Language Processing (NLP) is the field of computer science that is concerned with developing algorithms for analysis of human languages. Artificial Intelligence approaches( eg. Machine Learning) have been used for solving many tasks of NLP such as parsing, POS tagging, Named Entity Recognition, word sense disambiguation, document classification, machine translation, textual entailment, question answering, summarization, etc. Natural languages are notoriously difficult to understand and model by machines mostly because of ambiguity (eg. humor, sarcasm, puns), lack of clear structure, diversity (eg. models for English are not directly applicable to Chinese). Even so, in recent years we’re witnessing rapid progress in the field of NLP, due to deep learning models, which are becoming more and more complex and able to capture subtleties of human languages.
Team Members Tariq Alhindi (email@example.com) Christopher Hidey (firstname.lastname@example.org) Tuhin Chakrabarty (email@example.com) Business Understanding Automatic Detection of propaganda is essential to build tools that can assist people to navigate the web with more awareness of deliberate or indeliberate messages of what they read. Data Understanding 50000 articles for task 1 21000 sentences for task 2 Data Preparation […]
dina zaychik, dzay, firstname.lastname@example.org Sergey Sedov, Sianur, email@example.com Task 1. The hypothesis is that propaganda/non-propaganda on article level could be detected using distributional semantics features. That’s why we performed thorough preprocessing, removing urls, hashtags, unusual symbols, unusual articles beginnings, non-English first paragraphs (using langid open package), short texts. After that we trained fasttext supervised model (the […]
Team has considered following properties of data for coming up with the solution:
Repetition of text.
Length of words
Lexical analysis of words
frequency of words
trigrams and bigrams of words
Sentiments conveyed by the
The main modeling which included in
LSTM – Long short term memory with embedding from fasttext.
Using Bidirectional LSTMs and trainable embeddings initialized with GloVe for propaganda detection at the article level
Abstract¶This notebook tries to classify news articles in 2 classes propaganda and non-propaganda. 3 types of models Naive Bayes Classifier, Linear Support Vector Classifier and Recurrent Neural Network. The Linear SVC shows the best results. yesThe neural network comes close, while the Naive Bayes Classifier predicts only one class. The following packages have been used: […]
This work proposes the solution of HackTheNewsHackathon tasks. As the main problem binary classification for two classes “propaganda” and “non-propaganda” was chosen. This problem would be solved using open-source library DeepPavlov using ensemble of several different models, including sklearn models, shallow-and-wide convolutional model, attention bidirectional LSTM and GRU models and capsule networks.