Monthly Challenge – Ontotext case – Solution – Team epistemi

Posted 5 CommentsPosted in Classification systems

Week 1. Data Understanding + Feature Extraction We have been provided with a dataset consisting of 277419 records, which are extracted from DBpedia. It has 6 features and 1 label of 32 classes as summarised in below table: ID Item Name Item Type Observations 1 org Feature URL; may be useful for extraction of additional […]

Monthly Challenge – Ontotext case – Solution – Door

Posted 4 CommentsPosted in Classification systems

1. Business Understanding¶ Developing an automated and standardized classification model that can be used on any source to enrich the originally available data with industry sector information. Ultimately the task can be framed as an error/anomaly detection task. At the core, it is still a classification problem and the output should be not the ultimate […]

Monthly Challenge – Ontotext – Case

Posted Leave a commentPosted in Cases, Learn, MC-04-2019

Monthly Challenge: https://www.datasciencesociety.net/events/text-mining-data-science-monthly-challenge/ Mentors’ Weekly Instructions: https://www.datasciencesociety.net/text-mining-data-science-monthly-challenge/ Real Business Problem Classification of companies into industry sectors is a fundamental task for unlocking advanced business intelligence capabilities. However different data sources rarely use the same classification system if any. This is a huge obstacle for taking advantage of the available details in Open Data and very niche commercial […]

Text Mining Data Science Monthly Challenge

Posted Leave a commentPosted in Learn

Why you should join the Data Science Monthly Challenge and what you can expect? The Data Science Monthly Challenge provides an exceptional opportunity for participants, no matter of their background and previous experience, to be involved in finding a solution to a real data science problem step by step. The proposed gradual approach towards advanced business […]

CASE Ontotext, Team CENTROIDA

Posted 8 CommentsPosted in Team solutions

This paper presents a DNN-based approach to learn entities relations from distant-labeled free text. The proposed approach presents task-specific data cleaning, which despite effective in removing textual noise is preserving semantics necessary for the training process. The cleaned-up dataset is then used to build a number of bLSTM attention-based DNN models, hyper-tuned using recall as an optimization objective. The resulting models are then joined into an ensemble that deliver our best result

Ontotext case – Team _A

Posted 13 CommentsPosted in Learn, NLP, Team solutions

The objective of our task is extract parent-subsidiary relationship in text. For example, a news from techcruch says this, ‘Remember those rumors a few weeks ago that Google was looking to acquire the plug-and-play security camera company, Dropcam? Yep. It just happened.’. Now from this sentence we can infer that Dropcam is a subsidiary of Google. But there are million of companies and several million articles talking about them. A Human being can be tired of doing even 10! Trust me 😉 We have developed some cool Machine learning models spanning from classical algorithms to Deep Neural network do this for you. There is a bonus! We just do not give you probabilities. We also give out that sentences that triggered the algorithm to make the inference!  For instance when it says Orcale Corp is the parent of  Microsys it can also return that the sentence in its corpus ‘Oracle Corp’s  Microsys customer support portal was seen communicating with a server’, triggered its prediction.