Popular comments by fire

Datathon 2020 – Article recommendation

@preslav – I was late and then struggled with uploading the notebook (I was also not using notebook for the development).
@liad –
1) I think that with so much data, a feasable solution would be using the tags (because they represent the topic better than the title) and learning embeddings from scratch from the current dataset. Also I think the Transfer learning idea is possible.
2) The RNN in the code is LSTM, with Contextualized Article embeddings as inputs (that is for the same article the embedding is different in time, because it depends on the current article popularity). The sequence is the last N articles read by the user. It can also be viewed as a user-preference “sessions”, as they continually over time.
3) The added section tries to explain it. Basically, we combine popularity with content based information to form Contextualized Article embeddings. The the model is trying to predict the embeddings minimising the cosine similarity between the actual read news and maximising it between other popular news from the same day. So essentially we are learning a User-specific embedding space.

Datathon 2020 – Article recommendation

Hi, I’ve updated the article with the code and the idea explanation, although a bit late. Plots of the model achitecture, its training, and how the article popularity behaves over time can be uploaded too if needed.

Detecting propaganda on sentence level

Hi Alberto, zenpanik!
BERT acquired F1 0.59 for the DEV set.
These are the performances of the base models when used alone. The logistic regression classifier hyper-parameters were not tuned. The split was train 11380, validation 2807.

“description”: “Subjectivity and polarity”
“f1_pos”: 0.3606
“precision_pos”: 0.2977
“recall_pos”: 0.4574
“accuracy”: 0.5996
“description”: “Sentiment features”
“f1_pos”: 0.3631,
“accuracy”: 0.5814
“precision_pos”: 0.2908
“recall_pos”: 0.4834
“description”: “TFIDF”
“f1_pos”: 0.4586
“precision_pos”: 0.3811
“recall_pos”: 0.5758
“accuracy”: 0.6644
“description”: “Proper nouns”
“f1_pos”: 0.3908
“precision_pos”: 0.2633
“recall_pos”: 0.7576
“accuracy”: 0.4168
“description”: “Loaded language”
“f1_pos”: 0.3987
“precision_pos”: 0.3695
“recall_pos”: 0.4329
“accuracy”: 0.6776
“description”: “Lexical features”
“f1_pos”: 0.4377
“precision_pos”: 0.3218
“recall_pos”: 0.684
“accuracy”: 0.5661
“description”: “Emotion”
“f1_pos”: 0.4292,
“precision_pos”: 0.3318
“recall_pos”: 0.6075
“accuracy”: 0.601
“description”: “Confusing words”
“f1_pos”: 0.3544
“precision_pos”: 0.2675
“recall_pos”: 0.5253
“accuracy”: 0.5276
“description”: “Readability features”
“f1_pos”: 0.414
“precision_pos”: 0.337
“recall_pos”: 0.5368
“accuracy”: 0.6249