Datathons SolutionsImage recognition

Datathon – Kaufland Airmap – Solution – Phoenix

5
votes

6 thoughts on “Datathon – Kaufland Airmap – Solution – Phoenix

  1. 1
    votes

    That is a job well done on very short terms. Wish you had time to handle the misplaced items also. Thank you for choosing this case once again and really good “Other comments” section you got there 😉 One quick question to keep the ideas floating around:
    Do you think this approach can be used in reality, where the number of items is ~20 000 and continuously changing?

    1. 0
      votes

      Hi,
      A bit of a late reply, but still – here are a few thoughts on the approach in general. To be clear, I am referring to the underlying problem in the stores rather than the Datathon case. My feeling is that this is a bit of an XY problem (https://en.wikipedia.org/wiki/XY_problem), and you have it a bit backwards 😊
      If I were to formulate the business issue, I’d go like this:

      1. Make a “map” of the store, with all shelves and expected product placements. Not really sure how much this would cost but considering many autonomous driving approaches involve building centimeter-level-accurate maps of whole cities, it should be feasible. See below for a feasible simplification.

      2. Then the problem would be to match: [This is a place where somebody would mention Bayesian priors, I guess?]
      2a. Whether the product seen in the image matches the expected product (which I think is an easier task than identifying the product from scratch). And you can get extra info — is the product placed properly (front side facing the front and fully visible, etc.). This is lower priority but you might still want it fixed.
      2b. Whether the label is what it is expected to be.

      3. Currently, a lot of the sample photos seem to be adjacent to each other, with substantial (in some cases) overlap. Go further – combine the images into a ‘panoramic’ image that would ideally capture a whole shelf’s length. That would help greatly in (1) above – you do not need full coordinates in space of all shelves/products, a “map” would be something like “On shelf A, products are expected to be in the following order – X, Y, Z…”

      4. Lastly, labels:
      4a. Yes, labels can be made to be “machine-readable” with minimal effort. Lots of options to choose from – proper fonts (there are lots of questions on this on Stack Overflow/Quora/etc.); barcodes, QR codes…
      4b. Another option – so that a bigger barcode or QR code, or uglier font do not stick out in a bad way… Make the QR codes “open”, there are many ways this would provide value to the customer. E.g. I can have my phone remember what I buy typically (and maybe next time notify me on which shelf to find it; provide extra information on the product – nutrition values for food, etc.; get notified of promotional prices; …)

  2. 1
    votes

    Lol! It seems you’ve done a lot with this data. What are your evaluation criteria (how do you count a correctly detected and localized object?)? I don’t see any stats for the performance of your approach (ROC curves may be).

  3. 0
    votes

    @penchodobrev
    Thanks for the feedback! Yes, I believe that the approach can be applied in reality given that similar images and xml files are supplied for all products. It seems that the data augmentation code is working well enough, so even with smaller set of images the neural network can still be trained to differentiate between objects and background. However, an important thing to remember is that all images are made from a similar perspective and the background of the shelves is relatevely consistent. This has also to be the case in any future implementation.

    I don’t think that the perspective is a huge issue, as there are freely available tools, which can help with generating images, which simualte such behavior and the products are not always placed in the same way even in the current sample. However, I think that any change in the background should be paired with a fine tuning of the model before implementation.

  4. 0
    votes

    @pepe
    Thanks for the feedback!
    1) You can refer to code 05. Calculate FInal Prediction.ipynb to notebook “https://github.com/datasciencesociety/Phoenix/blob/master/!Clean%20Project%20Files/05.%20Calculate%20FInal%20Prediction.ipynb” to see the final asignment of the flags supplied to the leaderboard. We are just searching the space outlined in the xml files for a product. I don’t think that the evaluation approach used in the leaderboard is very good, as the coordinates in the xml file are not standardized and in cases of missing products account for the full available space, while in case of available products account only for the space of the specified product. 🙂 However, as seen in the BoxPlot_data.zip file we are correctly identifying almost all of the objects and 100% of the labels. – https://github.com/datasciencesociety/Phoenix/blob/master/!Clean%20Project%20Files/BoxPlot_Data.zip

    2) For performance we only used the leaderboard, as in Computer Vision it really depends on what you are interested in.

Leave a Reply