24-26 March 2017

Datathon Bulgaria: The first practical data challenge in Central & Eastern Europe


What is a datathon?

A datathon is a weekend-long challenge where you are challenged to work on a real-world case from the area of consumer finance, technology, retail, state administration.

You and your team have 48 hours to come up with a solution to a real business challenge based on the provided datasets. The jury will award the most precise, but also creative solutions.


Real-world problems

the output data and questions are extracted by the practice of companies and organizations

Multi-area expertise

to come up with a solution you need to work in teams with experts of different specialties - coding, data analysis, business intelligence

Learn in practice

collaborate and exchange knowledge with mentors, experts and Data Science Society volunteers

Diverse challenges

cases will involve different types of data processing & analyses, such as image recognition, credit scores modeling, pattern recognition and many more


the schedule

16:00 - 17:00   Press conference
16:30 - 17:00   Registration
17:10 - 17:20    opening and presentation: Gemseek
17:30 - 17:40   opening and presentation: Experian
17:45 - 17:55   opening and presentation: SAP
18:00 - 18:10   opening and presentation: HyperScience
18:10 - 18:20    opening and presentation: Kaufland
18:20 - 18:30   opening and presentation: PISA
18:30 - 19:00   BREAK
19:00 - 19:10   opening and presentation: Telenor
19:15 - 19:25    opening and presentation: VMware
19:30 - 19:40   opening and presentation: Ontotext
19:45 - 19:55   opening and presentation: Receipt bank
20:00 - 20:10   opening and presentation: А4Е
20:15 - 20:25   opening and presentation: ShopUp
20:30 - 20:40   opening and presentation: Министерски съвет - open data
20:40 - 20:45   Helecloud
20:50 - 21:30   opening and presentation: team forming
21:30   work starts
11:59:00 AM   team registration
00:00 - 23:59   work + lunch and dinner + 2 coffee breaks
00:00 - 15:00   work + 2 coffee breaks
15:00 - 16:00 BREAK
16:00 – 17:30   presentations and Q&A
18:00 – 20:00   voting, judges and awards
Eventbrite - Datathon Bulgaria: The first practical data challenge in CEE

The cases

These are short descriptions of the cases you will have to choose from. You can take a final decision at the day of the challenge, when you form your team.


Credit scores are used by lenders, such as banks and credit card companies, to evaluate the potential risk, posed by lending money to consumers and to mitigate losses due to bad debt. The Datathon teams will be tasked with enhancing an existing credit score model by taking into account economic dynamics and extending the creditworthiness prediction on a monthly basis in the frame of 36 months.


Datathon participants will have the opportunity to apply their favourite ML and AI algorithms on a couple of gygabytes of real bank documents, kindly provided by Hyperscience. The purpose will be to create an AI system to perform certain tasks, such as document type recognition or extracting useful information from documents. Тhere will be a training and a test set of data with ground truth information - different types of bank forms filled in with synthetic data. The proposed AI system should work well with test datasets of document with good quality and bad quality - for example, blurry or wrinkled documents. Organizers promise there will be unexpected challenges - documents with traces of spilled coffee :)

Receipt Bank

Datathon participants will be provided with a large number of receipts and invoices (at least 9000) and their task will be to choose and apply different ML and AI algorithms to extract useful information from them, such as supplier name, client purchases and so on. In order to build ML models and evaluate results, the Datathon participants will be provided with a large (4M+) supplier database with corresponding ground truth information (existing name of suppliers) and the positions of different texts in the document. Developed ML algorithms should manage to overcome hard challenges of real data with mentors help: receipts and invoices originate mostly from UK and other countries around the world.


Datathon teams will be provided business data such as store type, location, total space, visitors, parking availability, as well as external data such as demographics for each town/city, traffic data, weather data. They will have to find out what is the relation and which are the deciding factors of the surrounding environment that impact visitors traffic volume in stores.


One of GemSeek’s core activities is developing and implementing marketing survey research. In addition to quantitative data, surveys also include questions where the respondent gives detailed opinions or statements in the form of freehand text. The main task would be to develop a machine learning algorithm that reads through the text, takes into consideration the logic and the sense of both labels and sentiment, and suggests possible allocation of unstructured text within category levels. The participants will be provided a dataset with unstructured text. Additionally, the set of predefined categories will also be available, containing labels and sentiment.


In the age of data, growing demand for computational resources makes data centers one of the fastest-growing consumers of electricity in developed countries. The Datathon teams will have to understand power usage and trends in the compute nodes (ESX servers) and come up with power optimization suggestions based on the model they create. To solve the task they may need to analyse the data sets, feature extractions, detect outliers, massage the data, propose and verify a model. They would also need to demonstrate power savings, based on the created model.


The SAP case deals with demand forecasting. A production plant produces fresh goods (e.g milk) and delivers them to retail shops on a daily base. The fresh goods need to be sold quickly otherwise in 2-4 days will be returned to the production plant and marked as loss. The plant needs to forecast the optimal quantity to deliver to each shop with each transport in order to minimize the losses. The data provided represents the historical deliveries per shop.


ShopUp provides an SaaS analytics platform for malls, shopping centres and other retailers, which supplies business owners with information on shoppers’ behaviour, venue performance and staff optimization. Datathon teams will be provided with a list of 100 000 WIFI network names and their task will be to identify different parameters for the network owner, such as type of entity, physical address, category or type of business, company details, online feedback or ratings, etc.


Ontotext is a world-renowned leader in semantic technologies and big data tools and services, including text mining and triplestore graph database “Graph DB”. The Datathon Teams will be provided with a subset of the Bulgarian Trade Register between 2008-2017, to convert this data into Linked Open Data format, according to a supplied simple RDF model. Next, they will be guided how this result can be linked to other open data sets. The Results will be applied to demonstrate the power of LOD and its use for revealing hidden facts from unstructured data.


A4E will provide real life historic sales data by a retail company. The goal is to design a recommender system, which suggests appropriate combinations of products (the so-called "market basket analysis"). This information is useful for marketers to create combined offers (a combination of products frequently bought together at a lower price). The retailer’s data is enriched with weather data, bank holidays and days of the week for the observed period, which could be used to improve the system’s recommendations for the next 2 weeks. Weather forecast and bank holidays within this period are also available. Original ideas and implementation approaches will be praised by A4E with an attractive partnership opportunities.


“The biggest risk is not to take a risk”
If you fill comfortable working with numbers - You can explore predictive models for risk assessment and maximization of portfolio yield. Don’t limit yourself just in the risk assessment and corporate bad debt optimization - The research is about potential corporate clients on the Bulgarian Market and contract decision making based on historical data. The objects of the research will be the data of registered companies on the Bulgarian Market combined their profitability index in the specific timeframe. The risk could be accepted and still the relations to be on positive P&L effect, if the profitability score is positive.

questions and answers

Your team’s solutions will be evaluated according to several criteria - statistical indicators, resourcefulness of the solution and creativity of visualization.

You can use the output data and other information provided to you in regard to solving the chosen case only during the duration of the event. You must not use those for personal or commercial purposes outside the event format and activities.

With the registration at the Datathon, you agree to also provide your work (analysis, methodology, source code, etc) to the organizers for them to publish it at the event page.

Organizers: Data Science Society

Data Science Society is a Bulgarian volunteer community of experts in data processing, modelling and analysis. Its main mission is to promote and encourage collaboration, knowledge sharing, innovation and entrepreneurship in all areas that make extensive usage of data. Data Science Society volunteers organize thematic meetups and talks, facilitate educational programs and connect experts and enthusiasts to enable data-driven projects.

Learn more

Our partners

General Terms of Participation

With your registration, you accept the following conditions: