Datathons Solutions

Datathon – HackNews – Solution – Path Finders

In recent years, deceptive content such as fake news and fake reviews, also known as opinion spams, have increasingly become a dangerous prospect for online users. Fake reviews have affected consumers and stores alike. Furthermore, the problem of fake news has gained attention in 2016, especially in the aftermath of the last U.S. presidential elections. Fake reviews and fake news are a closely related phenomenon as both consist of writing and spreading false information or beliefs. The opinion spam problem was formulated for the first time a few years ago, but it has quickly become a growing research area due to the abundance of user-generated content. It is now easy for anyone to either write fake reviews or write fake news on the web. The biggest challenge is the lack of an efficient way to tell the difference between a real review and a fake one; even humans are often unable to tell the difference. We are implementing  7 machine learning classification techniques here.


Misleading Beliefs

GROUP NAME – Path Finders

PARTICIPANTS-1.Abhinav Gaharwar([email protected]) 2. Sanjeev Biswas([email protected])

3.Gauranga Mallick([email protected]) 4.Dhirendra Mohan Jha ([email protected] )


WEAPON-R ,Python

LIBRARY USED-pandas,numpy,matplotlib,sklearn etc.


Most of the people of our generation are getting some of their news from social media now. Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of “fake news”, i.e., low quality news with intentionally false information.

Fake news articles are intentionally fabricated to be deceptive and can be proven that they are false.Fake news and spread of misinformation are important concepts which may have serious real world consequences. One of the main reasons of generating fake news is the economic gain which can be acquired by getting more clicks or generating paid fake content  for parties who want to get more clicks. Another common reason for generating fake news is trying to create a deception and/or a political bias within users in order to get more supporters.


There are two aspects of the news on social media: Traditional news shared on social media and social media as a source of news. The second aspect is sometimes used by the traditional media houses to generate news articles. Both aspects, intentionally or unintentionally, can lead fake news to spread even more manipulating mass ideology.



The problem statements comes in three stages,on the very first front we have to predict the moral of the article sponsored that is if it is a propaganda or not.The next task is to predict the agenda of the statements of the articles  and the task third is to predict the propaganda use .

The data set obtained was in text format that we had to convert into csv to get it processed through the python ,the programming language we had to use to unravel the mysteries within it.

The data set contains the news article ,the news number and the news type that is ,it is non propaganda article or on the flip side had some agenda .

Task 1 had approximately 36000 rows and 3 columns ,the predictors we used are the news statements  and news number,

Task 2 had approximately 15170 rows and 3 columns,in this we had to get each and every  statement of any  particular article and understand its agenda.

We have applied different machine learning algorithm to best classify our data and to predict the news type with model with better accuracy.


Exploratory Data Analysis


This is world cloud ,giving a better understanding of the data and the words used.




Problem statement-

Given a news article, you are required to build an intelligent system that is able to detect whether the article is propagandist or not.

Data set used.







Steps involved:-

1.Data preprocessing

Have removed null values in the data set ,as it was amounting to very low percentage of the complete data.

2.Data Extraction



2. Splitting the data into train and test sets


3.Modelling(implementing the best model )


4.Classification Report

The accuracy score for the Logistic regression model is approximately 95 %




Given a news article, you are required to build an intelligent system that is able to detect whether each of its sentences is propagandist or not. A sentence is considered propagandist if it contains at least one out of eighteen propagandist techniques (

Data Set used –


Steps involved:-

1.Data preprocessing

2. Splitting the data into train and test sets

3.Modelling(implementing the best model )

4. Report of the model.


Computational linguistics can aide in the process of identifying fake news in an automated manner well above the chance level. The proposed linguistics-driven approach suggests that to differentiate between fake and genuine content it is worthwhile to look at the lexical, syntactic and semantic level of a news item in question

With the increasing popularity of social media, more and more people consume news from social media instead of traditional news media. However, social media has also been used to spread fake news, which has strong negative impacts on individual users and broader society. In this article, we explored the fake news problem by reviewing existing literature in two phases: characterization and detection. In the characterization phase, we introduced the basic concepts and principles of fake news in both traditional media and social media. In the detection phase, we reviewed existing fake news detection approaches from a data mining perspective, including feature extraction and model construction. We also further discussed the data sets, evaluation metrics, and promising future directions in fake news detection research and expand the field to other applications

Fake news can be accurately identified using machine learning methods.

However, there is enough evidence that Fake news is not too difficult to detect, at least in some selected domain. It is also difficult to say with confidence how much the result of this experiment can be applied to real-world news. We hope to broaden the scope of our data evaluation  and try to apply our method in a more general way in the future.







Share this

Leave a Reply