Understanding and Exploring the Data
Week 1. Data Understanding + Feature Extraction We have been provided with a dataset consisting of 277419 records, which are extracted from DBpedia. It has 6 features and 1 label of 32 classes as summarised in below table: ID Item Name Item Type Observations 1 org Feature URL; may be useful for extraction of additional […]
1. Business Understanding¶ Developing an automated and standardized classification model that can be used on any source to enrich the originally available data with industry sector information. Ultimately the task can be framed as an error/anomaly detection task. At the core, it is still a classification problem and the output should be not the ultimate […]
Monthly Challenge: https://www.datasciencesociety.net/events/text-mining-data-science-monthly-challenge/ Mentors’ Weekly Instructions: https://www.datasciencesociety.net/text-mining-data-science-monthly-challenge/ Real Business Problem Classification of companies into industry sectors is a fundamental task for unlocking advanced business intelligence capabilities. However different data sources rarely use the same classification system if any. This is a huge obstacle for taking advantage of the available details in Open Data and very niche commercial […]
Why you should join the Data Science Monthly Challenge and what you can expect? The Data Science Monthly Challenge provides an exceptional opportunity for participants, no matter of their background and previous experience, to be involved in finding a solution to a real data science problem step by step. The proposed gradual approach towards advanced business […]
This paper presents a DNN-based approach to learn entities relations from distant-labeled free text. The proposed approach presents task-specific data cleaning, which despite effective in removing textual noise is preserving semantics necessary for the training process. The cleaned-up dataset is then used to build a number of bLSTM attention-based DNN models, hyper-tuned using recall as an optimization objective. The resulting models are then joined into an ensemble that deliver our best result
The objective of our task is extract parent-subsidiary relationship in text. For example, a news from techcruch says this, ‘Remember those rumors a few weeks ago that Google was looking to acquire the plug-and-play security camera company, Dropcam? Yep. It just happened.’. Now from this sentence we can infer that Dropcam is a subsidiary of Google. But there are million of companies and several million articles talking about them. A Human being can be tired of doing even 10! Trust me 😉 We have developed some cool Machine learning models spanning from classical algorithms to Deep Neural network do this for you. There is a bonus! We just do not give you probabilities. We also give out that sentences that triggered the algorithm to make the inference! For instance when it says Orcale Corp is the parent of Microsys it can also return that the sentence in its corpus ‘Oracle Corp’s Microsys customer support portal was seen communicating with a server’, triggered its prediction.