Popular comments by vpekar

Datathon – HackNews – Solution – Leopards

Hi, Thank you for yor comment. Our answers:
1. Yes, we took word2vec vectors of non-stopwords in each sentence and concatenated them.
2. There is a small table in the evaluation section. The features are sorted there by their Gini Importance index, which is output by the Gradient Boosting implementation in scitkit-learn. We agree it would be useful to have a better understanding of the importances of the features, e.g. look at other ways to measure feature importance, e.g., via feature elimination in different learning methods. A task for future work.
3. There are now two links to two zipped Jupyter notebooks – for Tasks 1 and 2.