The food industry is governed by strict laws and regulations, which provide certainty that each product meets health and safety standards. In addition to existing biochemical food product analysis, we propose a metagenomic approach. Main benefit of this approach is the ability to perform next generation sequencing as a standard first step and then align the sampled data to genomes references of many organisms suspected to be present in the sample. Additionaly, if another organism is suspected at a later date, it is easy to reause the sampled data set to perform another analysis – in the biochemical analysis this would require expensive sample storage and performing more laboratory tests. We examined three approaches to metagenomic analysis – BLAST, Centrifuge and BWA MEM.
The objective of our task is extract parent-subsidiary relationship in text. For example, a news from techcruch says this, ‘Remember those rumors a few weeks ago that Google was looking to acquire the plug-and-play security camera company, Dropcam? Yep. It just happened.’. Now from this sentence we can infer that Dropcam is a subsidiary of Google. But there are million of companies and several million articles talking about them. A Human being can be tired of doing even 10! Trust me 😉 We have developed some cool Machine learning models spanning from classical algorithms to Deep Neural network do this for you. There is a bonus! We just do not give you probabilities. We also give out that sentences that triggered the algorithm to make the inference! For instance when it says Orcale Corp is the parent of Microsys it can also return that the sentence in its corpus ‘Oracle Corp’s Microsys customer support portal was seen communicating with a server’, triggered its prediction.
We developed workflow utilizing Blast and Centrifuge toolkits, that is able to provide precise metagenomics information about food composition, from comparing DNA reads with reference genomes of various species. Our workflow is optimized to work on Google Cloud instance (Compute Engine) with 24 CPUs and 200 GB of RAM.