The Data Monthly Challenge
The Data Monthly Challenge is a monthly online challenge where beginners and advanced people in the field of data science are challenged to learn through solving one real-world business case step by step.
The goal is the participants to build up their knowledge and expertise helped by mentors, experts, and other like-minded data enthusiasts coming from all over the world. During the challenge, the participants should improve their knowledge by following their own pace and getting each week guidelines how to proceed the challenge. Thanks to the Society’s mentors, the participants will have guidance and feedback on what they can improve on the previous phase helping them to get the most of the challenge.
The most creative and precise solutions of the provided dataset for the challenge will be presented!
As its name would suggest, the Ontotext case is in the field of text analytics and it is a real case provided by Data Science Society’s partners from Ontotext. The main problem is how to build an efficient classification model of the industry category of different companies. This can be done by using some simple features, for example – a short description of the company. However, participants will have the opportunity to enrich the data themselves by using the FactForge platform which will enable them to create a unique dataset of more complex features. This can potentially lead to boosting model performance and a better understanding of the nature of the problem.
You can introduce yourself to the case details here.
The Ontotext team provided a csv file containing the “basic” features for all the companies part of the sample. It consists of a company description, location, type of organization etc. In the link provided above, you can find instructions and a list of example queries that can be used to extract more complex features.
The main goal is to build as precise as possible classification model of the industry category, then carefully investigate the main causes of misclassification and suggest how the model can be improved. At best, the final product should be a completely automated classification model that can work with information from various data sources and still provide accurate and robust results.
How does it work?
The challenge has several phases of completion – in order to start solving the next phase, the participants would need to complete the previous one. Industry experts and mentors from Sofia University will help them with the challenge by providing advice on how to proceed in the challenge.
The Challenge begins on Tuesday, 23rd of April at 19:00 (UTC +3:00) with a short explanation of the case and some introduction mentor’s guidelines at our YouTube channel. At the beginning of the challenge, the participants will have an initial boost for the first step they should complete.
Monday – a day for a Submission! Each Monday the participants should submit their solution to the specific stage of the challenge.
Tuesday – a Feedback day! There will be a peer-to-peer review where every participant is encouraged to give comments below the others’ article.
Wednesday – a day for the next step of the challenge! The mentor will provide some helpful resources and tips on how to approach the next phase of the challenge.
Thursday – a Mentor’s Guidance day! The mentor will upload his/her approach for the solution of the case task from the previous week for the participants to go further if they are stuck on the specific stage of the challenge.
Friday – Q&A day! You will have a channel in our Data.Chat for communication with the rest of the participants, the industry experts, and the mentors. Ask your questions there!
The Monthly Challenge will end on 28th of May with a presentation of the machine learning solutions to the classification problem.
The Data Science Challenge is opened to the global community via our platform and everyone is welcome to participate either in a team or as an individual.
What is expected?
The monthly challenge is an interactive educational tool designed for (MSc- / PhD-) students in the field of Data Science and Business Analytics and also data science enthusiasts eager to learn. It provides a complete walkthrough of the process adopted by professional analysts when delivering a data-driven solution. The instructions by the mentors you can find here.
Even though it is based on a real-world case study, the challenge is well-suited to beginners is the field as well. In particular, the work on the case is broken down into simple and clear steps supported by mentors’ instructions. Furthermore, every week we organize a discussion on the issues you have encountered while working. Yet, if you feel lost at a certain stage of your analysis, it’s OK, you could always ask the mentors for further hints.
It is a month-long hard work, therefore, we would advise the following:
- Work in teams or individually.
- Use the Data.Chat for communication and questions
- Make a schedule of tasks with proper deadlines on weekly basis.
- Invest some time in getting deeper knowledge on the recommended techniques and tools. Put it differently, avoid application of code based on a theoretical framework that seems to be unfamiliar to you.
- Make regular discussions and catch-up meetings with the other members of your team (if you work in a team).
- Make an article on a timely basis with every progress of your works.
The virtual places are limited, so be fast – sign up before April, 17th and start the challenge!