Team members: Ana Popova, @anie Izabella Taskova, @ izabellataskova Kamelia Kosekova, @kameliak Kameliya Lokmadzhieva, @kameliyalokmadzhieva Nikolay Bojurin, @nikolay Mentors: @boryana @alex-efremov @pepe Team name: DAB PANDA Team logo: NB!!!! OUR NOTEBOOKS ARE AVAILABLE HERE: DAB PANDA Rmds Data Understanding and Preparation You may see our code with results and brief comments if you […]
This paper presents a machine learning based approach for solving the business problem of identifying from pictures the products chosen by the Kaufland customers. These pictures are all taken from the same angle and typically show one or multiple products from the same category in a bag which makes the background and the bag recurrent elements.
Here we explored the method of transfer learning – using already trained and very deep NN like InceptionV3, InceptionResnetV2, VGG19, Resnet50 with combinations of retraining and no retraining of the existing layers. We solved this multiclassification problem of predicting the probabilities of each class of products by adding different final custom layers and we obtain the best result of 85% accuracy on a validation set of 20% (which was never seen by the training model).
This result was achieved with the model VGG19 which distinguished itself not only for providing the best categorical accuracy but also for training speed, execution speed once deployed and reduced resource consumption.
Techonnology and methods used: R – plyr, dplyr, tidyverse, stringr, data.table, geohash, ggmap, maps, robustbase, geosphere, pracma, Hmisc, ggplot2, tidyquant, reshape2, pastecs Python – s3fs, pandas, numpy, matplotlib, plotly, geohash2, folium, geopy OLS Regression, Ridge Regression, Decision Trees Introduction Air pollution beyond the norms is a common problem in many locations. Examining the causes behind and being able to predict […]
We developed workflow utilizing Blast and Centrifuge toolkits, that is able to provide precise metagenomics information about food composition, from comparing DNA reads with reference genomes of various species. Our workflow is optimized to work on Google Cloud instance (Compute Engine) with 24 CPUs and 200 GB of RAM.