Image recognitionLearn

How to Easily Recognize Text from Challenging Images

0
votes

Friends of the Data Science Society had a couple of occasions to celebrate this week. First, after a longer than usual pause we are back in the event organizing business. Second, the topic our speaker Ekaterina presented stands at the fascinating intersection of machine learning, mobile application development and healthy lifestyle. As a bonus, our audience had the opportunity to explore a new venue – the cosy “Tell me bar” close to the National Theatre and the “Bulgarian Broadway” – Rakovski street.

Our speaker, Ekaterina, is the founder of Sugarwise, a young social business aimed at improving people’s health. Sugarwise is the reason for Ekaterina to go into Image Processing. She has worked on practical machine translation and is a huge machine learning enthusiast. She is a co-organizer of the Lisbon Open Data Meetup, and in her spare time loves to pass her practical knowledge forward and learn. She will gladly chat with anyone who is interested in tech, the future of tech and the business, and explore together how to find solutions to current social problems.

The challenge Ekaterina embarked to tackle that evening was reading the nutrition information from a yoghurt cup. Our smart audience immediately suggested a clear-cut decision – to read it from the bar code. As Ekaterina pointed out, this would have only worked for the USA, which doesn’t fit the global scale on which she desires to solve the problem. An originalAnother common solution is to use an OCR (Optical Character Recognition) engine such as Tesseract. Unfortunately, OCR works well with black characters on white background, which is not the case with food labels.

Next, Ekaterina led us on a journey how to read the nutrition information. The first step is to convert the colour image to a grayscale one because of the desirable properties of grayscale images (dimensionality reduction). Then, the cardinal problem that needs solving is how to represent the image as black letters on white background. In order to do that, Ekaterina tried to cluster the pixel colours into a cluster of black colours and a cluster of white colours. The method she tried is k-mean clustering. The output of this method is an image where the background is converted into white, while the text is turned into black. Here Ekaterina shared what she learned the hard way – it’s better to save an image as .png, because saving an image as .jpg introduces noise to it, since JPG is a lossy compression format which keeps pixel values close to their original value, a change undetected by the human eye, but not missed by computers.

Unfortunately, the output of this step still confuses the Tesseract engine because of the lines in the nutrition table (usually all nutrient information is provided in a table with lines separating each column, row and cell). The challenge here lied within the fact that when Tesseract sees a vertical line it assumes it has reached a line end, and starts reading the next line skipping all textual content after the first detected vertical line. Ekaterina then presented several ways to delete black lines. The first is to remove black regions with more than 400 pixels, relying on the fact that letters usually have fewer pixels. The results are not encouraging though. Another solution she tried, detecting uninterrupted black regions, would cut some letters in half and is overall ineffective. Only after exhausting these options did the smart decision emerge. Ekaterina pointed out that we need to remove the vertical lines in order to enhance Tesseract’s reading capabilities, not all the lines. To do that, she processed the image a little bit more to separate it into regions of pixel colours.

Imagine there is a binary image (black and white) with the letters “A” and “B” on it. Separating these regions – background, letter “A” and letter “B”, would mean to paint all pixels of the background with a pixel colour, for simplicity 0 is used, nevertheless in pixel colours 0 is black, all pixels of “A” with pixel colour 1, and all pixels of “B” with pixel colour 2. Note that in the original binary image all background pixels are set to 255, or white, and all letter pixels, of “A” and “B”, are set to 0, or black. Back to the problem – Ekaterina wants to identify all vertical lines of the segmented image. The solution is to separate the image into smaller parts, called windows, and identify the partial vertical lines in each window. She does this by searching for pixels of the same region, 1 or 2 in the case of the example above, that lie on the top and bottom frame of each window. Having found these pixels, she draws an imaginary line between the top and bottom ones and sees if most of the pixels lying under this imaginary line are of the same region, 1 or 2. If yes, then the algorithm has idetified a partial vertical line that needs to be deleted from the image. An interesting alternative worth exploring is Fourier analysis, which should be a familiar topic of the people in signal processing.

After the traditional Q&A session, the talk naturally continued in the Tell me bar. Take a look at the video from the presentation if you want to learn more. Don’t miss our great upcoming events and projects – stay tuned by visiting our website, following our Facebook pageLinkedIn page or following our twitter account.

Share this

Leave a Reply