Datathon – HackNews – Solution – schlaflos

Using Bidirectional LSTMs and trainable embeddings initialized with GloVe for propaganda detection at the article level


Business Understanding

Detecting propaganda in news.


Data Understanding

  • exploring  the distribution of words in articles (task1)
  • exploring the distribution of words in sentences (task2)


Data Preparation

  • convert words to lower case
  • tokenize words in article with nltk’s word_tokenize
  • load GloVe vector embeddings (50D, Vocab=400000 words, Trained on=6B tokens)
  • replace word with corresponding glove vector id
  • new vector initialised with zeros, created for unknown words


  • Tried combinations of 1 to 3 Bidirectional LSTM layers, with trainable and non trainable word embeddings
  • Included BatchNormalization and Dropout between layers


Task 1

Model performance improved significantly with Bidirectional Layers and Trainable Embeddings.

Performance of stacked BLSTMs is improved gradually with training.

Model with trainable embeddings tends to overfit with too many epochs (significant difference between train and validation scores).

Trainable embeddings with too many LSTM layers were avoided for performance reasons.


Current model architecture with best validation f1 score;

64u BiLSTM layer,
0.2 dropout,
2u BiLSTM layer returning sequences,
flatten layer,
1u dense layer with sigmoid.
Positive Threshold=0.5.
Trainable embeddings,
5 epochs

Scores for no. of epochs=3:

acc : 0.9935042378768931
precision : 0.9659160073037127
recall : 0.976915974145891
f1 : 0.9713848508033666

Validation (split from given training data, size=20%)..
acc : 0.9585996110030564
precision : 0.800761421319797
recall : 0.8173575129533679
f1 : 0.808974358974359


Tried models and their scores

IPython notebook on Google Colab

Share this

Leave a Reply