This is a solution for task 1 case. I used BiDirectional LSTM. The data is not plenty but i experimented with using a Keras LSTM implementation with globe embeddings and am sharing my result.
This notebook contains:
- Visualisations such as Word cloud of the most common words;
- Uni and Bigrams of frequent propaganda vs non-propaganda words;
- Boxpolots comparing number of words, characters and punctuation in each class. You will some some interesting dynamics there;
- Loading a glove embeddings which were preloaded on my PC
- Training a Keras Bidirectional LSTM of batch_size=512 and epochs=5, i was also using dropouts at 2 instances to avoid overfitting
import os import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns pd.options.mode.chained_assignment = None # default='warn' from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot import cufflinks as cf cf.go_offline() pd.options.display.max_colwidth = 300 from sklearn.feature_extraction.text import CountVectorizer from sklearn import model_selection from sklearn.model_selection import train_test_split from sklearn import metrics from sklearn.model_selection import cross_val_score from scipy.sparse import hstack from wordcloud import WordCloud, STOPWORDS from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation, CuDNNGRU, Conv1D, Conv2D, MaxPool2D, Reshape, Flatten from keras.layers import Bidirectional, GlobalMaxPool1D from keras.models import Sequential from keras.models import Model from keras import initializers, regularizers, constraints, optimizers, layers import string from nltk.stem import SnowballStemmer import plotly.graph_objs as go from plotly import tools import plotly.offline as py py.init_notebook_mode(connected=True)