Team solutions

Using Convolutional Neural Networks for Real-Time Product Recognition in Smart Scales – Imagga’s Solution to the Kaufland Case

Many big retailers offer in store a rapidly growing variety of fresh produce from local and global sources, such as fruit and vegetables that need to be weighed quickly and effortlessly to mark their quantity and respective price. Smart scales that use image recognition to optimise user experience and allow additional features, such as e.g. cooking recipes can provide a new solution to this problems. The solution we provide to the Kaufland case includes training a Convolutional Neural Network (CNN) with GoogLeNet architecture on the original Kaufland data set and fine-tuning it with a Custom Training Set we have created, achieving the following results (Kaufland Case Model #13): training accuracy: Top-1: 91%, Top-5: 100%; validation accuracy: Top-1: 86.1% , Top-5: 99%, and TEST dataset accuracy of: Top-1: 86.1%, Top-5: 99.2%. We have also created another model (Kaufland Case Model #14) by combining similar categories, achieving: training accuracy: Top-1: 96%, Top-5: 100%; validation accuracy: Top-1: 92.5%, Top-5: 100%, and TEST dataset accuracy: Top-1: 91.3%, Top-5: 100%. All trainings were done on our NVIDIA DGX Station training machine using BVLC Caffe and the NVIDIA DIGITS framework. In our article we show visualisations of our numerous trainings, provide an online demo with the best classifiers, which can be further tested. During the final DSS Datathon event we plan to show a live food recognition demo with one of our best models running on a mobile phone . Demo URL: http://norris.imagga.com/demos/kaufland-case/

2
votes

18 thoughts on “Using Convolutional Neural Networks for Real-Time Product Recognition in Smart Scales – Imagga’s Solution to the Kaufland Case

  1. -1
    votes

    Sometimes you want to eat a sandwich, but you prepared an excessive gourmet meal. I do not believe the point of the datathon is to show off what your company is capable of, we know what money can get.

    1. 0
      votes

      Thanks for your feedback. We are not showing off but just trying to show the many experiments we conducted and the ones we started, trying to find the best solution we could achieve for this specific challenge.

    2. -1
      votes

      брат не се излагай. Счупиха си казуса, и са се раздали за кефа. Никой клиент не го боли за хакатони.

    1. 0
      votes

      Thanks for your comments. Foodnet in our paper refers to a food data set we have put together from parts of 2 public research data sets (Food101+ImageNet) and not to the paper and method you have mentioned. This was not clear, so thanks for mentioning it. We shall add some references to the article in due course. We have not tested the model performance on a Raspberry Pi but based on similar models we have tested in the past on such hardware we expect performance of 2-3sec per image.

  2. 0
    votes

    + Nice presented results
    + Good theoretical description of the models
    – One of the the Datathon requirement was to present a code of the solution, but such is not found.

  3. 0
    votes

    Nice work! I clearly see that you have a lot of professional experience in this area, the results and the different cases separation are excellent. However I will not give you maximum score on the voting. The reason for this is that I think that sharing knowledge is one of the most important things when you participate such an event. If you can’t share your code (for some privacy reasons), you could describe some of the more complicated steps, and give similar examples in open-source software, so people who see your work in the future can learn how to do similar stuff.

    1. 0
      votes

      Thanks for the feedback! Yes, sharing is caring! We were so in rush at the end, that we completely forgot about the requirement. We use entirely Open Source projects, such as Caffe and DIGITS so anyone can reevaluate our methodology. Here is the source code: http://norris.imagga.com/demos/kaufland-case/source.zip

      It cointains two of our best models (#13 and #14) with their mean files, protos for defining the architecture and the caffemodel file, which anyone can use for finetuning purposes.

  4. 0
    votes

    (+) Well formed article, shows understanding of the problem and task at hand
    (+) You’ve handled the insufficient data problem well and you’ve presented a good description of the model
    (-) Providing raw results is not very helpful you should add some analysis as to why you think you got the results that you have.
    (~) Also would be nice to know more about the actual performance of the models. The most accurate model is not always the best choice especially if it requires expensive hardware to run and you don’t have the budget for it.

    1. 0
      votes

      Thanks for your comments.
      We would add analysis of the results – we simply didn’t have the time to complete it before the deadline.
      As the hardware that such a solution will work on was not specified in the challenge and we didn’t have time to test it on different hardware, we decided not to include any performance figures at this stage. Our estimates for running the models on a Raspberry Pi are 2-3secs per image, for Nvidia Jetson TX1/2 about 30FPS and on a iOS mobile phone we achieved 60FPS @ 4K using Core ML.

Leave a Reply