|For how many years have you been experimenting with data?||
Popular articles by preslav
Popular comments by preslav
Great article! Tons of detail!
Nice model: it uses BERT, and combines it with a number of features that make a lot of sense.
Also, nice article and analysis.
Nice article, nice approach, and great results on Task 3!
Just one thing: it is unclear what resources need to be downloaded to make the attached code work. The code has many hardcoded paths to files that do not exist. E.g., where do we get the Urban Dictionary from?
– The source code is made publicly available on github.
– The article is somewhat short, but gives sufficient detail.
– The approach is standard but efficient (for tasks 1 and 2).
– This is best-ranked team overall:
– DEV: 3-4th, 1st, and 5th for task 1, task 2, and task 3
– TEST: 2nd, 1st, and 5th for task 1, task 2, and task 3
– Remarkably, on task 2, the team wins by a large margin.
* Detailed comments:
This is an exercise in using BERT (for tasks 1 and 2):
– paper: https://arxiv.org/abs/1810.04805
– code: https://github.com/google-research/bert
– other code: https://github.com/hanxiao/bert-as-service
BERT is a state-of-the-art model for Natural Language Processing (NLP), and beats earlier advancements such as ElMo. See more here:
The authors used fine-tuning based on parameters they have found in earlier experiments for other tasks. Fine-tuning BERT takes a lot of time…
1. Which model did you use for tasks 1 and 2? Is it model (b) from Figure 3? https://arxiv.org/pdf/1810.04805.pdf
2. Why did you use the uncased version of BERT?
3. Do you think that the large BERT model would help?
4. Did you try BERT without fine-tuning? If so, how much did you gain from fine-tuning?
5. Do you think you could be losing something by truncating the input to 256 for task 1?
You have done terrific job at analyzing the data in various ways and at designing a reasonable, directed neural network model for the task. The model uses deep learning and state-of-the-art tools and techniques (but TF.IDF-based SVM solutions have been also tried for comparison).
What is the baseline F1? Also, what is the accuracy?
Any results on cross-validation based on the training dataset for different choices of the hyperparameters of the network architecture?
Any thought what can be done next to further improve the model? Maybe combine TF.IDF with deep learning? Or perform system combination? Did the different systems perform similarly on the training set (e.g., using cross-validation)?