Datathons Solutions

Datathon – HackNews – Solution – Stark Team

2
votes

7 thoughts on “Datathon – HackNews – Solution – Stark Team

  1. 0
    votes

    Thanks for explaining the approach in your article. I am wondering whether the choice of the parameters of the representation (min-df, max-df, stopwording) was done. Was it purely looking at the performances or there was some intuition/analysis of the data that guided it? In the latter case, it would be nice to read about it in the article

    1. 0
      votes

      Hello Giovanni !
      Choice of parameters was mixture of experience, checking research papers and similar cases performance analysis and testing on the specific data set. I decided that lemmatisation and stemming are not good idea in this case as we will loose some context, while removing stopwords was a must. I am sorry not to try nltk stopword corpus – default sklearn corpus is known to have some issues.

  2. 0
    votes

    You’re on to something as you have the best score! I’m also a proponent of SVM “on top.” Do you think you would have done even better with a neural network as a feature extractor? If yes, what was the limitting factor, why didn’t you try?

    1. 0
      votes

      There is a chance that neural network will do better job as extractor but given the time constraint I preferred to make s safe bet – using simple and fast methods. I intended to experiment with neural network as well, but … will do this these days and share the results as a followup to the article.

    1. 1
      votes

      Thank you Alberto !
      This was also one of my points – to show that not always complicated algorithms are performing the best.
      But still – did not expect to score best F1 !

Leave a Reply