Business problem

It is nice to provide more than sixty types of vegetables to your customer. It is not nice if the customer is forced to search these products on a scale in a menu with more than sixty items. So, your task will be to find an algorithm to do this automatically. Making this procedure more comfortable will enhance customer satisfaction with Kaufland. Because of that, your task is to build up an image recognition that reliably recognizes which type of vegetables the customer has selected. A typical user scenario is: The customer is weighting one type of vegetables/fruits per time, which may (cherries, plums) or may not (watermelon) be wrapped into a plastic bag. The algorithm embedded into the scale, automatically recognizes the type of fruit/vegetable by recognizing an image, taken from camera located above the scale. The scale’s monitor shows several options of fruits/vegetables that are most likely on the scale and asks for customer’s confirmation. Ideally, the final results shown on the screen will be one fruit/vegetable, but as many of them share similar properties it would also be acceptable several items to be shown to the customer. The lighting, the angle from which the pictures are taken and the size of the pictures will be the same for every image.

Research problem specification

The goal of this research task will be to design image recognition system which recognizes and ranks the fruit’s/vegetables’ type with certain probability. The input of the system is an image taken from camera located above the weighting scale in real store environment.
The challenge for you is to recognize 3D objects that are naturally grown. For example, different types of apples can look quite different even it is the same type of fruit. Furthermore, vegetables appear differently if you rotate them. So, your model has to deal with this too. Last but not least, vegetables are already wrapped into bags when being weighed. Accordingly, they have to be reliably recognized in spite of strong reflections on the bags that may occur depending on the lighting of the store. Ideally your model works without being newly trained even if the store gets a new lighting system, the bags around the vegetables are badly crunched and the vegetables are of an extraordinary shape. Perhaps your algorithm will even be better in recognizing fruit than you!
Pre-processing of the image such as filtering, background removal, and edge detection may increase the accuracy.
You are invited to try many different approaches to enhance the accuracy of your model. For example, you can start with a Convolutional Neural Network. An approach to enhance the performance might be a Capsule Network (Hinton 2017) for recognizing and rotating complex three-
dimensional objects like wrinkly potatoes. Your model should achieve a high accuracy even if the
shape of the vegetables and the lighting conditions differ strongly from the ones during the modeltraining.
The model will be assessed by the final produced accuracy on the test dataset. The accuracy will be
calculated as number of correctly predicted objects over all objects in the test dataset. An object is
considered correctly predicted if it is in top three most probable objects of the model output.

Case video