Business Understanding
Detecting propaganda in news.
Data Understanding
- exploring the distribution of words in articles (task1)
- exploring the distribution of words in sentences (task2)
Data Preparation
- convert words to lower case
- tokenize words in article with nltk’s word_tokenize
- load GloVe vector embeddings (50D, Vocab=400000 words, Trained on=6B tokens)
- replace word with corresponding glove vector id
- new vector initialised with zeros, created for unknown words
Modeling
- Tried combinations of 1 to 3 Bidirectional LSTM layers, with trainable and non trainable word embeddings
- Included BatchNormalization and Dropout between layers
Evaluation
Task 1
Model performance improved significantly with Bidirectional Layers and Trainable Embeddings.
Performance of stacked BLSTMs is improved gradually with training.
Model with trainable embeddings tends to overfit with too many epochs (significant difference between train and validation scores).
Trainable embeddings with too many LSTM layers were avoided for performance reasons.
Current model architecture with best validation f1 score;
64u BiLSTM layer,
batchnorm,
0.2 dropout,
2u BiLSTM layer returning sequences,
batchnorm,
flatten layer,
1u dense layer with sigmoid.
Positive Threshold=0.5.
Trainable embeddings,
5 epochs
——
Scores for no. of epochs=3:
Training..
acc : 0.9935042378768931
precision : 0.9659160073037127
recall : 0.976915974145891
f1 : 0.9713848508033666
Validation (split from given training data, size=20%)..
acc : 0.9585996110030564
precision : 0.800761421319797
recall : 0.8173575129533679
f1 : 0.808974358974359
——