Misleading Beliefs
GROUP NAME – Path Finders
PARTICIPANTS-1.Abhinav Gaharwar([email protected]) 2. Sanjeev Biswas([email protected])
3.Gauranga Mallick([email protected]) 4.Dhirendra Mohan Jha ([email protected] )
WEAPON-R ,Python
LIBRARY USED-pandas,numpy,matplotlib,sklearn etc.
1.BUSINESS UNDERSTANDING
Most of the people of our generation are getting some of their news from social media now. Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of “fake news”, i.e., low quality news with intentionally false information.
Fake news articles are intentionally fabricated to be deceptive and can be proven that they are false.Fake news and spread of misinformation are important concepts which may have serious real world consequences. One of the main reasons of generating fake news is the economic gain which can be acquired by getting more clicks or generating paid fake content for parties who want to get more clicks. Another common reason for generating fake news is trying to create a deception and/or a political bias within users in order to get more supporters.
There are two aspects of the news on social media: Traditional news shared on social media and social media as a source of news. The second aspect is sometimes used by the traditional media houses to generate news articles. Both aspects, intentionally or unintentionally, can lead fake news to spread even more manipulating mass ideology.
2) DATA UNDERSTANDING
The problem statements comes in three stages,on the very first front we have to predict the moral of the article sponsored that is if it is a propaganda or not.The next task is to predict the agenda of the statements of the articles and the task third is to predict the propaganda use .
The data set obtained was in text format that we had to convert into csv to get it processed through the python ,the programming language we had to use to unravel the mysteries within it.
The data set contains the news article ,the news number and the news type that is ,it is non propaganda article or on the flip side had some agenda .
Task 1 had approximately 36000 rows and 3 columns ,the predictors we used are the news statements and news number,
Task 2 had approximately 15170 rows and 3 columns,in this we had to get each and every statement of any particular article and understand its agenda.
We have applied different machine learning algorithm to best classify our data and to predict the news type with model with better accuracy.
3.MODELLING
Exploratory Data Analysis
This is world cloud ,giving a better understanding of the data and the words used.
TASK 1
Problem statement-
Given a news article, you are required to build an intelligent system that is able to detect whether the article is propagandist or not.
Data set used.
Steps involved:-
1.Data preprocessing
Have removed null values in the data set ,as it was amounting to very low percentage of the complete data.
2.Data Extraction
2. Splitting the data into train and test sets
3.Modelling(implementing the best model )
4.Classification Report
The accuracy score for the Logistic regression model is approximately 95 %
TASK 2
Given a news article, you are required to build an intelligent system that is able to detect whether each of its sentences is propagandist or not. A sentence is considered propagandist if it contains at least one out of eighteen propagandist techniques (http://propaganda.qcri.org/annotations/definitions.html).
Data Set used –
l
Steps involved:-
1.Data preprocessing
2. Splitting the data into train and test sets
3.Modelling(implementing the best model )
4. Report of the model.
EVALUATION –
Computational linguistics can aide in the process of identifying fake news in an automated manner well above the chance level. The proposed linguistics-driven approach suggests that to differentiate between fake and genuine content it is worthwhile to look at the lexical, syntactic and semantic level of a news item in question
With the increasing popularity of social media, more and more people consume news from social media instead of traditional news media. However, social media has also been used to spread fake news, which has strong negative impacts on individual users and broader society. In this article, we explored the fake news problem by reviewing existing literature in two phases: characterization and detection. In the characterization phase, we introduced the basic concepts and principles of fake news in both traditional media and social media. In the detection phase, we reviewed existing fake news detection approaches from a data mining perspective, including feature extraction and model construction. We also further discussed the data sets, evaluation metrics, and promising future directions in fake news detection research and expand the field to other applications
Fake news can be accurately identified using machine learning methods.
However, there is enough evidence that Fake news is not too difficult to detect, at least in some selected domain. It is also difficult to say with confidence how much the result of this experiment can be applied to real-world news. We hope to broaden the scope of our data evaluation and try to apply our method in a more general way in the future.