You want to become a better Data Scientist, to challenge yourself and build a working algorithm for an unknown existing data problem or just to find new data science career opportunities? Then, you should give a try to a data Hackathon and break into competitive data science!
Even though you may not get much sleep during a weekend, it’s worth it! At data Hackathons, people improve their data modeling skills in various domains while the competition and the short terms bring out their creativity. It’s hard to believe how much even young data science professionals could achieve! Boosted by the team spirit and the short time participants find the best alternative approaches for solutions with fewer prediction errors and better accuracy, surprising even the domain experts to implement the models in the real world later.
The reasons for taking a part at a Hackathon
While there are lots of data related courses, papers and workshops available online and not only, what is difficult is to bring a structure to the lifelong learning process. Through Hackathons which are much more inexpensive than conferences, participants get the opportunity not only to meet renowned experts and listen to them, but to get a first hand experience with senior level data scientists, PhDs and professors who have years of work and are willing to go through a real problem with the curious ones who want to learn and improve their skills. This is only possible at such coding festivals where people celebrate their knowledge by exploring different from their daily data tasks and learn which of their skills are applicable in the real world.
Speaking about top performing data models, what is important is to combine different machine learning algorithms in reaching the бest solutions. But to come to that, you should first go through analyzing and interpreting data, to preprocess it and to generate new features, to overcome data-related issues like inconsistencies and high noise level. It’s hard to believe that anyone can do that overnight … or in 2 days, but it’s possible. Even with challenging business cases because at hackathons participants get to a certain structure and follow their goals of building the most accurate data models supported by like minded data science passionates and experts. Bare with us and explore some of those challenges.
Everyone should try
Why? Because anyone can and it’s a great learning opportunity, especially for students and new ones to the field. At most Hackathons, there are workshops or in person talks and sessions where mentors are also available 24/7. Basically, if you don’t know much about what you will do and what you could do coming to a hackathon, that’s totally fine because you will have domain experts and mentors at your service to help and guide you.
The Hackathons are a great opportunity for newbies but not only. Nowadays everyone speaks about agile development and teams but what does really mean? At data Hackathons there is nothing else than agile – it’s impossible to not experience the thrill of the work going on. Hackathons are perfect training opportunity not only for individuals like students and professionals looking for a fun building data solutions but also for data teams. The delay of results in the real world is not always because of the skills, but the communication and the more talking and less doing. During the hackathons, these are challenged and worked on for reaching the best solutions.
The sooner you begin, the better you will become!
The next data Hackathon called Global Datathon 2018 is just around the corner, next weekend between September 28th to 30th. It’s the third international edition challenging data science rockies and experts from all around the world to team up and challenge existing data real cases.
Real-World Data Science Cases
Text Data Mining
High quality commercially available company data may be unaffordable for many data analytics and business analytics. Many of the niche, but highly valuable data sources come short of details about industry sector. At the same time, the amount of Open Data (official or crowdsourced) is growing but it often lacks a standardized but practical approach to industry classification. To overcome this challenge the first case at the upcoming Datathon provided by Ontotext is about the classification of companies into industry sectors which is a fundamental task for unlocking advanced business intelligence capabilities.
Nikola Tulechki, Data Scientist and Semantic consultant at Ontotext AD and Atanas Kiryakov – founder and CEO of Ontotext and a semantic web pioneer and vendor of semantic technology will guide the participants in finding the best solutions to text mining classification problem. Watch Ontotext case video description and find out more about the case at http://bit.ly/2LbYBSB . The Datathon mentors have written also case guidelines to help the participants go through the challenges and pick their favourite easily.
In Sofia, air pollution norms were exceeded 70 times in the heating period from October 2017 to March 2018, citizens’ initiative AirBG.info says. The day with the worst air pollution in Sofia was January 27, when the norm was exceeded six times over. Things got so out of control that even the European Court of Justice ruled against Bulgaria in a case brought by the European Commission against the country over its failure to implement measures to reduce air pollution. To achieve results in predicting the PM10 fine articles’ high peaks of concentration and forecast the pollution level and still, to be as accurate as possible and to have a maximum value, the challenge is to predict those peaks and concentration levels within a 24-hour period. The data set for the Telelink case is available with the kind support of AirInfo and the domain knowledge of the experts from Denkstatt and will consist not only meteorological features like weather, humidity, wind but also traffic and behavioural data.
Ekaterina Marinova, Data Science Strategist at Telelink and Ivan Paspaldzhiev – Consultant at Denkstatt Bulgaria will help the curious ones for the Telelink case throughout the Datathon and they may be contacted via the Data.Chat for further questions. Here is the video description of the case and the case itself.
Economic Time Series Prediction
The National Statistical Institute of Bulgaria (NSI) conducts annually a Household Budget Survey (HBS). The main objective of the Household Budget Survey is to get reliable and scientifically founded data on the income, expenditure, consumption and other elements of the living standard of the population as well as changes, which have occurred during the years. The Household Budget Survey is a sample survey by the implementation of a two-stage cluster. The survey’s data on the expenditure of household according to COICOP (quarterly and annually) are used for the purposes of producing macroeconomic statistics – National Accounts and Consumer Price Index. In order to optimize the cost of carrying out the survey, the NSI is considering a change in the periodicity of the Household Budget Survey from yearly to once on every five years. Using the data provided by the NSI, the challenge is to propose a model for assessing household expenditure by groups for the years in which the survey are not conducted.
The participants will be guided by the industry experts – Svetoslav Ilkov, Junior Expert at National Statistical Institute, Svetoslav Ilkov, Junior Expert at National Statistical Institute and Rositza Balakova. More information to the case can be found in the video or in the case.
Do you know what do Game of Thrones and telecoms have in common? Sending ravens is one of the most fundamental parameters in mobile communications engineering. For land-based mobile communications, the received raven variation is primarily the result of multipath fading caused by obstacles such as buildings (or clutter) or terrain irregularities; the distance between link endpoints; predatory animals, and interference among multiple transmissions, for example, wars. This inevitable raven variation is the cause of communication dropping, one of the most significant qualities of service measure in operative communication.
Telenor challenges the curious Game of Thrones fans to make time-series analysis and predict the future amount of fails based on one-month data with flight fails. Valentin Tonev, Customer Finance Director at Telenor Bulgaria will help the participants with his domain of wisdom. Check the case and read the guidelines by Tyrion Lannister’s mentors team.
When we think how machines communicate with us, Apple’s Siri, Microsoft’s Cortana and Amazon’s Alexa inevitably come to mind. This is all about to change with the wider adoption of the Internet of Things. The IoT allows even big and dull machines like forklifts to give you feedback from their sensors. This feedback is stored in a database and is used not only to predict when the machine is about to fail, but also why. Failures of small parts in a machine can lead to a costly breakdown of the whole machine. Therefore, it is important to perform maintenance. Kaufland offers the unique opportunity to listen to the “voice” of automated storage and retrieval systems they use in distribution centers. The participants at the Global Datathon will explore their sensor data and investigate how it relates to machine failures.
Michael Opitsch who Data Scientist in Kaufland Information Systems will guide the participants through the challenge.
More information about the case can be found in here. Check the detailed guidelines for the curious data enthusiasts and pick the best case to compete and learn!
Data Hackathons are a great opportunity to learn more about how to implement machine learning algorithms in the real world, to dive deep into the ocean of unknown data challenges, to meet new people and to explore data science career opportunities.
Let’s face it – written instructions and online videos are helpful just to a certain point. To develop something really meaningful which can lasts even after the Datathon, the valuable experience is to be in the same place, even virtual, with the people and to talk to them and generate ideas. Come to the Data.Chat, ask the community and the experts about the cases. This time you can participate not only online, but from the data hubs in Bulgaria, Austria, India and Croatia!
Don’t forget that the registration is closing today – 24th of September so hurry up and book your place!