vicky

Popular comments by vicky

Ontotext case – Team _A

Hello Yasen!
Regarding your question on the train and test data differences we noted:
The RNN-Attention model is because initially I trained the model and the best validation score was 93% accuracy. But when I used it on the test set the predictions were terrible that even a layman would ignore it. Later I realized that the problem is because in the training set each pair had an average of 30-40 text snippet whereas in the test set it was 1-2. Hence the validation accuracy was not getting translated. Instead I reduced the number of articles in the training set to lesser than 10 pairs on each pair and retrained the model and it scored a validation accuracy of 83.2% and the top 20 pairs you see in the article are from this model and the results look plausible. What I am essentially trying to tell is unlike other datasets, this particular dataset can be tricky to report the actual validation scores. Yes, by the process of concatenating the text snippets we introduced the problem of discrepancy. But we were trying to be creative and thought we would try to see if it works. BTW I just realized that you are Laura 😛 Thanks a lot for your mentorship!

Ontotext case – Team _A

Preslav, every single question of yours make sense and I think we have to address it perfectly! On my part I will try to rerun the algorithm and try to understand even personally the science behind the combination of hyperparameters and write to you the results that I found. Thank you

Ontotext case – Team _A

Yes Andrey! We meant anti-symmetric, 40 hours of sleeplessness 😛 True, I was also regretting that we should have spent more time in critically analyzing the results of the classical machine learning models with NNs. Somehow these Neural networks are very fancy to quickly grab the attention like our attention model 😉 It was your and Laura’s continued support that helped us reach the finals! Thanks to you 🙂

Ontotext case – Team _A

Hello Toney! The main reason why I didn’t update the validation scores of the RNN-Attention model is because initially I trained the model and the best validation score was 93% accuracy. But when I used it on the test set the predictions were terrible that even a layman would ignore it. Later I realized that the problem is because in the training set each pair had an average of 30-40 text snippet whereas in the test set it was 1-2. Hence the validation accuracy was not getting translated. Instead I reduced the number of articles in the training set to lesser than 10 pairs on each pair and retrained the model and it scored a validation accuracy of 83.2% and the top 20 pairs you see in the article are from this model and the results look plausible. What I am essentially trying to tell is unlike other datasets, this particular dataset can be tricky to report the actual validation scores