This is a Leopards team’s submission for the Propaganda Detection datathon. Key findings: the best performing classifier is logistic regression, operating on Word2vec representations of the sentences plus several designed features like the proportion of sentiment-bearing words in the sentence.
Propaganda is a form of communication that is aimed at influencing the attitude of a community toward some cause or position. It often presents facts selectively to encourage a particular synthesis. The disinformation damages the reputation of respectable news outlets, organisations and very bad for business indeed. The objective of the Hackathon is to be able to detect the Propaganda and Non-propaganda news as well as to develop a model that can help with the venture. The other objectives of this work includes detecting phrases which are propagandist and also finding out the type of propaganda it is. The algorithms that we will be taking help from are Passive Aggressive, Multiple Layer Perceptron Network, Logistic Regression, AdaBoost, Decision Tree, Random Forest, KNN, SVM and Naive Bayes to detect the potentially propagandistic and non-propagandistic sentences in a news article. For the evaluation, we are calculating F1 Score to measure the class imbalance in the testing dataset. We have used the best model for detecting propagandist and non-propagandist articles, phrases and also type of propaganda.
Everyday we come across fancy jargon like data science, machine learning , artificial intelligence, computer vision, NLP, etc. You must have wondered as why terms like data science and AI are used together in names of research institutes like the Alan Turing Institute for Data Science and Artificial Intelligence. Does these two words mean the same ? Does it not? If it is same , why not club them into a single term , if not then why not have two different names instead of using them along side one another.
Over the years AI has developed from a theoretical concept to an acceptable technological term that is used in all fields. From self-driven cars, complex medical procedures, welfare, and many other areas. Artificial intelligence was viewed by many as a complex undertaking and was considered as an area for computer geniuses and nerds. However, in […]
Preliminary Analisys Due to the objective focused on predicting air quality forecast for the next 24 hours per station, first step should be data understanding for citizen science air quality measurements to group it by station and summarize them by day. To complete this task for inspection and pre-processing in order to find missing data, outliers and […]
— Team Teljapenosss Team Members — Jalapeno (Nasiba Zokirova) Team Mentor: petya-par Business Understanding The levels of air pollution allegedly caused by solid fuel heating and motor vehicle traffic are ever growing in the City of Sofia. The primary economical impact for the City of Sofia was a ruling by the European Court of […]
In this article the mentors give some preliminary guidelines, advice and suggestions to the participants for the case. Every mentor should write their name and chat name in the beginning of their texts, so that there are no mix-ups with the other menthors. By rules it is essential to follow CRISP-DM methodology (http://www.sv-europe.com/crisp-dm-methodology/). The DSS […]
In this article, the mentors give some preliminary guidelines, advice, and suggestions to the participants for the case. Every mentor should write their name and chat name at the beginning of their texts so that there are no mix-ups with the other mentors. By rules, it is essential to follow CRISP-DM methodology (http://www.sv-europe.com/crisp-dm-methodology/). The DSS […]
The Kaufland Case poses an interesting Predictive Maintenance challenge. First, make sure that you understand what the goals and deliverables are. This is perhaps the most important step in the entire Data Science process. It’s crucial for the business value of the result and it ensures that you spend the little time you have on […]
The project tries to create a model based on data provided by the World Health Organization (WHO) to evaluate the life expectancy for different countries in years. The data offers a timeframe from 2000 to 2015. The data originates from here: https://www.kaggle.com/kumarajarshi/life-expectancy-who/data The output algorithms have been used to test if they can maintain their accuracy in predicting the life expectancy for data they haven’t been trained. Four algorithms have been used:
Linear Regression with Polynomic features
Decision Tree Regression
Random Forest Regression