Introduction to NLP
Natural Language Processing (NLP) is the field of computer science that is concerned with developing algorithms for analysis of human languages. Artificial Intelligence approaches( eg. Machine Learning) have been used for solving many tasks of NLP such as parsing, POS tagging, Named Entity Recognition, word sense disambiguation, document classification, machine translation, textual entailment, question answering, summarization, etc. Natural languages are notoriously difficult to understand and model by machines mostly because of ambiguity (eg. humor, sarcasm, puns), lack of clear structure, diversity (eg. models for English are not directly applicable to Chinese). Even so, in recent years we’re witnessing rapid progress in the field of NLP, due to deep learning models, which are becoming more and more complex and able to capture subtleties of human languages.
Introduction to NLP
Propaganda is a form of communication that is aimed at influencing the attitude of a community toward some cause or position. It often presents facts selectively to encourage a particular synthesis. The disinformation damages the reputation of respectable news outlets, organisations and very bad for business indeed. The objective of the Hackathon is to be able to detect the Propaganda and Non-propaganda news as well as to develop a model that can help with the venture. The other objectives of this work includes detecting phrases which are propagandist and also finding out the type of propaganda it is. The algorithms that we will be taking help from are Passive Aggressive, Multiple Layer Perceptron Network, Logistic Regression, AdaBoost, Decision Tree, Random Forest, KNN, SVM and Naive Bayes to detect the potentially propagandistic and non-propagandistic sentences in a news article. For the evaluation, we are calculating F1 Score to measure the class imbalance in the testing dataset. We have used the best model for detecting propagandist and non-propagandist articles, phrases and also type of propaganda.
News is the lifeline of the human society , it underlines all the important events and influences public opinion like no other tool , but with the recent advent of electronic media and the sheer amount of new being churned out and the current political climate it’s hard to figure out what’s genuine news and what’s propaganda , this is where intelligent systems which can classify news articles , text fragments as propagandistic or non-propagandistic comes into play , this Datathon is focussed on developing such a system using various algorithms and methods to predict such a scenario the levels of challenges are:
A System that is able to classify a news article whether it is propaganda or not.
A System that is able to classify whether a sentence in a article is propaganda or not.
A System that is intelligently able to classify the propaganda technique used in the new piece.
In recent years, deceptive content such as fake news and fake reviews, also known as opinion spams, have increasingly become a dangerous prospect for online users. Fake reviews have affected consumers and stores alike. Furthermore, the problem of fake news has gained attention in 2016, especially in the aftermath of the last U.S. presidential elections. Fake reviews and fake news are a closely related phenomenon as both consist of writing and spreading false information or beliefs. The opinion spam problem was formulated for the first time a few years ago, but it has quickly become a growing research area due to the abundance of user-generated content. It is now easy for anyone to either write fake reviews or write fake news on the web. The biggest challenge is the lack of an efficient way to tell the difference between a real review and a fake one; even humans are often unable to tell the difference. We are implementing 7 machine learning classification techniques here.
Due to the extreme divergence of social discussions happening in the political space, rumours and fake news becoming inferno which is difficult for anyone who reads to differentiate it from the truth.
What we are going to achieve?
To detect the propagandas at article level, sentence level and recognizing its type.
Using supervised machine learning technique, model shall be created to identify and flag the false news propaganda.
The word propaganda is defined as designating any attempt to influence the opinions or actions of others to some predetermined end by appealing to their emotions or prejudices or by distorting the facts. We are fooled by propaganda chiefly because they appeal to our emotions rather than to our reason. They make us believe and do something we would not believe or do. And since it appeal more to our emotions; we often don’t recognize it when we see it.
The current political landscape is shaped by extreme polarization of opinions and by the proliferation of fake news.
Studies and surveys has found that rumour’s and fake news tend to spread six times faster than truthful information. This situation both damages the reputation of respectable news outlets and it also undermines the very foundations of democracy, which needs free and reliable press to thrive. Therefore, it is in the interest of the public as well as of the news organizations to be able to detect and fight disinformation in all its forms.
Here, we are trying to create a tool that can help identify propagandistic articles with the help of Predictive Analytics.
The main objectives are:
(i) to flag the article as a whole
(ii) to detect the potentially propagandistic sentences in a news article
(iii) to identify the exact type and span of use of propagandistic techniques
Team Name : Data Titans Team Members : M.HEMANTH KUMAR, A.PAVAN SHANKAR, B.MANOHAR, V. LITHIN CHOWDARY, E.V.S.SAI RAM PROBLEM STATEMENT : Hack the news whether it is propaganda or Non-Propaganda INTRODUCTION: Propaganda is a view which can mislead us to certain false assumptions, So here we got a chance to Identify the Propaganda in the […]