Popular articles by pepe
We developed workflow utilizing Blast and Centrifuge toolkits, that is able to provide precise metagenomics information about food composition, from comparing DNA reads with reference genomes of various species. Our workflow is optimized to work on Google Cloud instance (Compute Engine) with 24 CPUs and 200 GB of RAM.
This paper presents a machine learning based approach for solving the business problem of identifying from pictures the products chosen by the Kaufland customers. These pictures are all taken from the same angle and typically show one or multiple products from the same category in a bag which makes the background and the bag recurrent elements.
Here we explored the method of transfer learning – using already trained and very deep NN like InceptionV3, InceptionResnetV2, VGG19, Resnet50 with combinations of retraining and no retraining of the existing layers. We solved this multiclassification problem of predicting the probabilities of each class of products by adding different final custom layers and we obtain the best result of 85% accuracy on a validation set of 20% (which was never seen by the training model).
This result was achieved with the model VGG19 which distinguished itself not only for providing the best categorical accuracy but also for training speed, execution speed once deployed and reduced resource consumption.
Team members: Ana Popova, @anie Izabella Taskova, @ izabellataskova Kamelia Kosekova, @kameliak Kameliya Lokmadzhieva, @kameliyalokmadzhieva Nikolay Bojurin, @nikolay Mentors: @boryana @alex-efremov @pepe Team name: DAB PANDA Team logo: NB!!!! OUR NOTEBOOKS ARE AVAILABLE HERE: DAB PANDA Rmds Data Understanding and Preparation You may see our code with results and brief comments if you […]
Introduction Data provided consists of 3 years of weekly volume of sales, price of product in question, prices of main competitors and promotion calendar for a FCMG product. Data is provided by SAP. The task is to identify the volume uplift drivers, measure the promotional effectiveness and measure the cannibalization effect from main competitors. […]
Our best model (derived from VGG) achieved 99.46% top3 accuracy (90.18% top1) with processing time during training of 0.006 s per image on a single GPU Titan X (200s / epoch with 37 000 images).
The teams vision is for the team members to see where they stand compared to others in terms of ideas and approaches to computer vision and to learn new ideas and approaches from the other team-mates and the mentors.
Therefore the team is pursuing a pure computer vision approach to solving the Kaufland and/or the ReceiptBank cases.
Team name: Cheetahs Case: Telelink (iGEM) Provider: IBM Business Understanding The task for the Telelink case is to obtain the complete set of genome traces found in a single food sample and ALL organisms that should not be found in the food sample. The business needs a solution to this DNA Sequence identification case for improved […]
About¶ Entry: Data Science society Datathlon 2018 Case: SAP Case Dataset: available here Authors: Hristo Piyankov (email@example.com) Notes: not all caclulations and graphs are carried out in python, due to time constraints Business understanding¶ Goal of the study is to udnerstand drivers behind sales up-lift with relation to the company’s own pricing strategies, promostions and […]
In this article we present our solution for helping customers and making their shopping experience easier while identifying products from images. We bring forward our idea and discuss the results of our CV experiment.
Many big retailers offer in store a rapidly growing variety of fresh produce from local and global sources, such as fruit and vegetables that need to be weighed quickly and effortlessly to mark their quantity and respective price. Smart scales that use image recognition to optimise user experience and allow additional features, such as e.g. cooking recipes can provide a new solution to this problems. The solution we provide to the Kaufland case includes training a Convolutional Neural Network (CNN) with GoogLeNet architecture on the original Kaufland data set and fine-tuning it with a Custom Training Set we have created, achieving the following results (Kaufland Case Model #13): training accuracy: Top-1: 91%, Top-5: 100%; validation accuracy: Top-1: 86.1% , Top-5: 99%, and TEST dataset accuracy of: Top-1: 86.1%, Top-5: 99.2%. We have also created another model (Kaufland Case Model #14) by combining similar categories, achieving: training accuracy: Top-1: 96%, Top-5: 100%; validation accuracy: Top-1: 92.5%, Top-5: 100%, and TEST dataset accuracy: Top-1: 91.3%, Top-5: 100%. All trainings were done on our NVIDIA DGX Station training machine using BVLC Caffe and the NVIDIA DIGITS framework. In our article we show visualisations of our numerous trainings, provide an online demo with the best classifiers, which can be further tested. During the final DSS Datathon event we plan to show a live food recognition demo with one of our best models running on a mobile phone . Demo URL: http://norris.imagga.com/demos/kaufland-case/
The Folder where you can locate the data & the code (using Jupyter). Академичен дататон¶ In : import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns In : WRITE_TO_FILE = False In : plt.style.use(‘bmh’) pd.options.display.precision = 3 Зараждане на данните от price_data.csv¶ In : %%time raw_df = pd.read_csv(‘./raw_data/price_data.csv’) raw_df.info() # CPU […]
Popular comments by pepe
If you kept the colors you should get better results. It is excellent that you shared all you code in the notebook. You should also try to import additional datasets and enrich the provided data.
I don’t see any data prep and clean. How did you deal with the missing values?
Also your code is unreadable, you don’t have any plots, and this does not really seems like article. You could update it with .ipynb file or html directly to the site.
The plots are not really readable. Good job with the missing data, but upload the code as a notebook file, or html, so it can be read here, otherwise it is not possible for anyone to give you feedback and recommendations.
Great job. The article looks very good. One thing to fix – make it all in English.
For the data clean and prep you did good job – I could recommend that you may do some of the following checks: outliers, periodic, trend.
It is great that you implemented ARIMA model. Optional you could try SWR, PCA or GARCH.
Another important part of the task is the metrics.
Keep up with the job and I am sure other mentors will give you more comments and recommendations.
Great! Thank you for sharing.