Identrics Results_Dependency_Trees
Syntactic Parsing or Dependency Parsing is the task of recognizing a sentence and assigning a syntactic structure to it. The most widely used syntactic structure is the parse tree which can be generated using some parsing algorithms. These parse trees are useful in various applications like grammar checking or more importantly it plays a critical role in the semantic analysis stage. For example to answer the question “Who is the point guard for the LA Laker in the next game ?” we need to figure out its subject, objects, attributes to help us figure out that the user wants the point guard of the LA Lakers specifically for the next game.
The first problem that we faced was to split the sentences correctly. We had a problem because there are a lot of different signs in the text that confuse the algorithm and it fails to define the end of the sentence correctly. To avoid that one could try to use python splitters and manually to define the sentence’s cease. Other alternative is to add splitting symbols to the existing ones. Another solution to the problem could be to replace some of the problematic symbols with ones that are sentence splitters.
Another solution could be to load an English dictionary or so and to make a sentiment analysis on the different sentences. After that one could define the beginning and the end of the sentence relying on its sentiment analysis. For that purpose one could try and use the doc.sent package in python as in the example below:
EXAMPLE
doc = nlp(u”This is a sentence. Here’s another…”)
sents = list(doc.sents)
assert len(sents) == 2
assert [s.root.text for s in sents] == [“is”, “‘s”]
After we have successfully split the text into different sentences, we moved on and built some dependency trees. For that purpose we used SpaCy package. spaCy dependency parser provides token properties to navigate the generated dependency parse tree. Using the dep attribute gives the syntactic dependency relationship between the head token and its child token. The syntactic dependency scheme is used from the ClearNLP. The generated parse tree follows all the properties of a tree and each child token has only one head token although a head token can have multiple children.