Datathon cases

Datathon2020 – Optimize Retail Supply Chain case – provided by Kaufland

Artificial Intelligence (AI) is making huge impact in the Logistics Industry. Recent research shows that AI will enable companies to “…derive between $1.3trn and $2trn a year in economic value from using AI in supply chain…”.
Following this spirit Kaufland has prepared a hot case straight out of their Supply Chain Management systems.



McKinsey Research link


Kaufland is a German hypermarket chain, part of the Schwarz Gruppe. It is amongst the biggest hypermarket chains in Central and East Europe. The chain operates over 1,200 stores in Germany, the Czech Republic, Slovakia, Poland, Romania, Bulgaria, Croatia and Moldova. The Kaufland team is setting trends in innovation devoted to enhancing customers’ satisfaction with products and services offered in their stores.

Case Summary

With over 1000 stores and 20 000 items in stock, just over 20 million forecasts have to be calculated daily. The results are then to be translated to thousands of orders where number of business rules apply. Usually some packaging and transportation optimizations will take place at this point.

Traditional Forecasting and Replenishment
Traditional Forecasting and Replenishment

Even though  traditional demand forecasting and replenishment systems include machine learning algorithms for forecasting and statistics to solve the optimization problems, we believe these might rely on too many hard business rules. With the growing demand to offer better service at lower price the area needs a novel approach. One able to orchestrate multiple items and stores at once and to continuously evolve as the business demands change.

Once again Kaufland challenge You to think outside the box and provides a case for you to apply Artificial Intelligence to its best.

Research Problem

Among the most important tasks of forecasting and replenishment system is to ensure high availability of the items which are most important to the customers. Besides justifying the in store demands, good forecasts will allow more flexibility in the further processing and optimizing storage and delivery expenses. You can imagine some of the stock in the store is being piled up in the back storage, but there the customers can’t really access it. The more inaccurate our system, the more space has to be used inefficiently and so we come to the first cost we need to minimize – the storage cost. Solving this would’ve been very simple if the cost of packaging and transporting were not our problem. Every time we place an order our hidden friends at the distribution centers start a lengthy process to satisfy what we desired. All the items, ready to transport, are combined and placed on pallets. Each pallet have to be piled up to a maximum height, and only 33 of them will fit in a transportation truck. Quick thinkers already figured out that we will be most efficient for the packaging process if all the items are ordered per pallet and all the transportation trucks are full. Well, that is not always the case and part of your task would be to set up your algorithm to handle this in best manner. We can help you though – by relaxing some of the rules and introduce the following assumptions:

  1. All items are delivered daily with no delay
  2. All items can be combined and transported together, so just for this challenge fish and candy go together
  3. All item packages fit perfectly together and stacking on the pallet can be calculated as the sum of the fractions each item would normally occupy
  4. All items arrive at the stores with the stated period in days until reaching expiration date
  5. Items with priority less than 0.25 points can be out-of-stock for the sake of transportation costs. The rest of the items are to be available at all times. Ideally with at least their minimum amounts


Kaufland does not take risks compromising customer satisfaction, so items are being tracked very carefully for their quality and expiration dates. Each item that has reached its expiration date has to be disposed at the expense of the company, so yet another cost. We can summarize the costs so far:

  1. Storage cost – the quantities above the “max_qty” cannot be on the shelves, available for the customers. A storage cost applies for each piece that have to be stored overnight in the storage
  2. Handling cost – for each item only quantities multiple of order unit quantity (order_qty) can be ordered. The cost of handling for the ordered quantity per item is highest when the quantity is furthest of the quantity of palette (transport_qty) and can be considered 0 when the ordered quantity is multiples of the transportation unit quantity. This cost can be approximated by the expression:

Handilng cost expression

  1. Transport cost – similarly to the handling cost, the transport cost is highest when the number of transport units in terms of multiples of transport_qty is lowest, and lowest when the transportation units are multiples of 33
  2. Waste cost – the quantities reaching their best before date must be disposed at the expense of the company

There are multiple paths to tackle every problem. One might choose to apply simple statistics with some robust coding to solve this. Can we think beyond and build something more intelligent? Here is what we suggest – Artificially Intelligent:

Reinforcement Learning scheme
Typical framing of Reinforcement Learning scenario

Reinforcement Learning (RL) is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. It is concerned with how software Agents ought to take Actions in an Environment in order to maximize the notion of cumulative reward. The Agent can be Deep Neural Network the Action of which would suggest orders to be delivered to the Environment. The provided sales data can help you build a simulation of the demand for all items by sampling from the historical data or even better – from generated sample with the same statistics as the historical one. The Environment will have to respond to each Action with new State – current stock for the items and Reward – considering the costs we described.

The Data

Kaufland provides two data sets. One consists of sales data for one year for around 100 items with various selling intensity. The other one (master_data.csv) holds the storage costs, expiration periods, and transportation parameters for the items.

Description of the fields:

  • item_id – unique id of each item
  • item_name – short name of each item
  • unit – the base units used for the item (PCS, KG)
  • order_qty – number of base units that fit into single order unit
  • transport_qty – number of base units that fit onto transportation unit (palette)
  • min_stock – minimum quantity the store should have on display in base units
  • max_stock – maximum quantity the store can have on display in base units. Quantities above are subject of storage cost
  • item_prio – an index to represent the priority of having one item in stock over another
  • storage_cost – the cost of keeping the item in the storage
  • mhd – the period until the item reach expiration date (best before date) in days
  • sold_qty – the number of sold base units for the day
  • stock_qty – the number of base units in stock in the store at the end of the day
  • the_date – the date of the event in “%Y-%m-%d” format




Full dataset:



Share this

3 thoughts on “Datathon2020 – Optimize Retail Supply Chain case – provided by Kaufland

  1. 0

    Hello, new to this data science thing. I’m here to learn… So anyone willing to add me to their team would love to join a team, learn and do whatever I can to be of value to our team…. Also I’m interested in the “Optimize Retail Supply Chain” .

  2. 0

    Farewell, everybody. I acknowledge that you particularly need to loosen up event, for this there is an overall eminent help that with canning give you atlanta escorts the most unquestionable opinions and sentiments, I acknowledge that you will be exceptionally satisfied to have a relationship with the young ladies.

Leave a Reply