|For how many years have you been experimenting with data?||
Popular articles by yasen
Popular comments by yasen
Great work guys! I’m really happy to see so many graphics and experiments! It’s really important to visualize the data and experiment (I would say more important than achieving top scores) and you did a great work!
Here are some notes I made while reading the article:
– good analysis and visualisation of the data
– data augmentation is a good idea when not enough data is provided, or when training complex NN models, but 80k snippets seems like a big enough corpus already. I wouldn’t give that a high priority.
– you claim there are differences in the text in the train and test sets? It would be nice to see some graphics about accuracy comparisons on the dev set and test set, or some other form of proof.
– coreference resolution was in Identrics’ case, also some dependency parsing, perhaps you could have used their notebooks 🙂
– I don’t understand this: The first one was using function from R*R -> R that holds h(a,b) != h(b,a) and add this as feature.
– normalizing the company names is a very good idea, specially if you only have 400 companies in all examples
– “Now lets preprocess the unlabeled test set in order to use it as corpus for more words and prepare it for input in the models”. You should be very careful not to transfer some knowledge from the test set in the training phase, even through w2v embeddings.
– Your understanding that there are examples in the training data which don’t hold information about the relation between the two mentioned companies, and are yet in the training set, is a serious problem (if the task is to detect relations on sentence level). Also, kudos for finding this! Concatenating the examples to solve the business problem is one option, yes. Also, you could try to handle the problem on its own, I would suggest using diffferent training sets (from the web), clustering of the training examples or any other analysis which would actually clean up the training data. If this is also valid for the test set, it would be very hard to evaluate any model, not knowing which of the test examples actually hold information about the parent-subsidiary relation.
– “It is to be noted that the number of text snippets corresponding to each pair in the training data varied largely from some companies like Google and YouTube having approximately 4000 snippets to smaller companies having 2 or 3 snippets. Such a huge variance created big troubles in the test data which will be explained later.”
– It looks like you introduced this problem yourself by concatenating all training examples for company pairs in single documents 🙂
– great set of useful experiments and results in the linked notebooks
– Also, I agree with Tony about the abstract, keep it simple and let Ontotext sell their case to the audience 🙂
Great work! I agree with the mentors above 🙂 You built a great model and achieved a very high score! Great productivity for a group of 3 people. I would also like to see more graphics, scores, etc.
Here are some notes I made reading your article:
– the dataset is not that biased, negative examples are not orders of magnitude larger than positive
– normalizing the company names is a very good idea
– stopwords may bring value in some cases, always test and verify if removing them actually helps
– A_team noted that in many examples the text doesn’t hold enough information about the relation between the two companies (producing erroneous examples). Did you observe that and did you try to handle it?
– If the results are on the test set, great results! Very good application of neural networks.
Very good article! Very well-written, good job 🙂
It would be nice to compare your results with a baseline (predicting the previous point, or average of the previous points or something similar). Also, your idea to use accumulated measures for the past 1 hour is interesting. Did you compare it with just giving the previous points as features to the regression?
It would be nice to share your code as well.
Interesting solutions, I didn’t see any baseline for comparison. Also, why did you decide to train bidirectional LSTM? I doesn’t seem very natural to read the timepoints backwards.