Guidelines

Hack News Datathon Mentors’ Guidelines – Propaganda Detection

hack-news-mentor-guidelines
2
votes

In this article, the mentors give some preliminary guidelines, advice, and suggestions to the participants for the Hack the News Datathon case. Every mentor should write their name and chat name at the beginning of their texts so that there are no mix-ups with the other mentors.

Introduction to NLP

Natural Language Processing (NLP) is the field of computer science that is concerned with developing algorithms for analysis of human languages. Artificial Intelligence approaches( eg. Machine Learning) have been used for solving many tasks of NLP such as  parsing, POS tagging, Named Entity Recognition, word sense disambiguation, document classification, machine translation, textual entailment, question answering, summarization, etc. Natural languages are notoriously difficult to understand and model by machines mostly because of ambiguity (eg. humor, sarcasm, puns), lack of clear structure, diversity (eg. models for English are not directly applicable to Chinese).  Even so, in recent years we’re witnessing rapid progress in the field of NLP,  due to deep learning models, which are becoming more and more complex and able to capture subtleties of human languages.

MENTORS’ GUIDELINES | Propaganda Detection

Before going into details, let us re-visit the three subtasks:

  1. Given document d,  identify whether d is propagandistic or not.
  2. Given document d, identify which exact sentences in d are propagandistic.
  3. Given document d, identify which specific phrases are propagandistic and which technique they use to convey their message.

As one can observe, both tasks 1 and 2 are binary classification tasks, whereas task 3 is a multi-class tagging task.

Representation

For all three tasks there are multiple alternatives to compute representations; from manually-engineered to automatically-inferred features. Perhaps the most straighforward representation is the known as bag-of-words model (BoWaddress). In BoW the order of the words is neglected and each of them is weighted either on the basis of statistics of the single document, a collection, or both. Other valuable representations include the occurrence of certain words (e.g., particularly negative/positive ones) or the style in the writing. Consider for instance this MPQA’s or Bing Liu’s lexicons. Be creative! Try novel representations!

Another option is considering distributional representations: embeddings. These are models that map words, sentences, or full documents into a  vector space. One good property of such vectors is that representations of semantically-similar words appear close to each other in such a space. There are multiple pre-computed embedding models available online, so you do not need to train your own model from large volumes of data. For instance, consider GLOVE, word2vec, or fastext. See Mikolov et al, 2013  for further details.

Usually the computation of such representations requires a number of pre-preprocessing steps, which may include stopword removal, stemming, and/or lemmatization, part-of-speech tagging, casefolding, punctuation removal, etc. Multiple libraries exist to perform these tasks (cf. Tools and Frameworks).

Classification

One of the simplest classification models is the k nearest-neighbours algorithm. In this case, there is no training stage, but a new item is assigned to the majority class with respect to the k closest elements in the representation step. More sophisticated models include naïve bayes, support-vector machines, or multi-layer perceptron, among many other alternatives.

Tagging

Task three is a sequential task in which each fragment in the text (e.g., a token) has to be labeled as one of the propagandistc techniques or none of them. Perhaps the “standard” task resembling the most of that of named entity recognition. There is plenty of material online about this technique, including an introduction to the topic and a tutorial using sklearn.

 

Tools and Frameworks

These are non-exhaustive lists of resources. There are way more out there.

Preprocessing

Frameworks

General machine learning

Deep learning

Embeddings

Multi-purpose

Don’t forget to apply your knowledge and skills in the challenge – The community works with advisors from top institutes in the world who are invited as experts and a jury of the best solutions which will be awarded out of a crowdfunding campaign.

 The registration is free but mandatory – Join before 21. January!

Share this

9 thoughts on “Hack News Datathon Mentors’ Guidelines – Propaganda Detection

  1. 0
    votes

    This article’s subject matter was very challenging for me to research and write about. In order to stay on top of the many essays that needed to be written, I established a system. The best way to prepare for an essay test is to develop a list of all the possible essay questions. Having a clear idea of what type of help you’ll need to finish the task is essential. If you are unsure of the solution to a question, you should always ask yourself a question before beginning to write an essay. The process of buying from https://www.trustmypaper.com/ turned out to be simpler than I had expected.

  2. 0
    votes

    You may ponder: what amount does it cost to employ designer full-time Ukraine? In addition, how does the Ukraine designer compensation contrast and the dev pay rates in different nations? Or on the other hand on the off chance that you’re a web designer, you may inquire: how much specialists make a year? Maybe you’re searching for a compensation mini-computer on the yearly income of web engineers. To respond to these inquiries, we should jump into a compensation correlation of the designers for various programming innovations in various nations, and perceive how Ukraine piles up. Front end engineers are liable for fostering the pieces of a web application with which a client communicates. In this manner, they are liable for ensuring that a site looks outwardly engaging, and that it conveys the most ideal client experience. Starting around 2018, 37.8% of all engineers overall were front end designers>> cost of hiring a software developer in ukraine

  3. 0
    votes

    Hi friends. I love girls very much. But I travel a lot for work and do not get a relationship. But I know that I always have girls ready to help me escorts boston they will come to your hotels and give you the hottest nights. I think you can get those nights too.

Leave a Reply