vasild

Popular comments by vasild

Datathon – HackNews – Solution – Stark Team

Hello Giovanni !
Choice of parameters was mixture of experience, checking research papers and similar cases performance analysis and testing on the specific data set. I decided that lemmatisation and stemming are not good idea in this case as we will loose some context, while removing stopwords was a must. I am sorry not to try nltk stopword corpus – default sklearn corpus is known to have some issues.

Datathon – HackNews – Solution – Stark Team

There is a chance that neural network will do better job as extractor but given the time constraint I preferred to make s safe bet – using simple and fast methods. I intended to experiment with neural network as well, but … will do this these days and share the results as a followup to the article.