|Country of origin?||
|For how many years have you been experimenting with data?||
Popular articles by vpekar
Popular comments by vpekar
Hi, Thank you for yor comment. Our answers:
1. Yes, we took word2vec vectors of non-stopwords in each sentence and concatenated them.
2. There is a small table in the evaluation section. The features are sorted there by their Gini Importance index, which is output by the Gradient Boosting implementation in scitkit-learn. We agree it would be useful to have a better understanding of the importances of the features, e.g. look at other ways to measure feature importance, e.g., via feature elimination in different learning methods. A task for future work.
3. There are now two links to two zipped Jupyter notebooks – for Tasks 1 and 2.
Thank you for the comment. These values are actual CHi^2 values.