Extracting cyrillic text using OCR plugin

Hi,

I'm planning to use OCR text detection plugin in my project and I managed to extract text by running 'cloudinary.v2.api.update' with { ocr: 'adv_ocr' } parameter.
It extracts english text well:
http://res.cloudinary.com/godovod/image/upload/v1513136310/ocr/eng_bw.png
https://godovod.ru/api/files/ocr?file=ocr/eng_bw

But it doesn't extract russian, there is no 'ru' among detected lanuages:
http://res.cloudinary.com/godovod/image/upload/v1513138221/ocr/ru_bw.jpg
https://godovod.ru/api/files/ocr?file=ocr/ru_bw

Google Vision API itself extracts russion text perfectly:
https://gyazo.com/c652dba9a166e9f9058a60552b2d3a24

Are there any restrictions or maybe some OCR trained data or settings are missing?

Dmitry Bezrukov

December 14, 2017 15:36

I've made some tests - in order to extract russian text we should explicitly set 'languageHints' in Google Vision API requests when using TEXT_DETECTION feature. If we use more complex DOCUMENT_TEXT_DETECTION feature the API parses russian text without any hints. So in my project i started to use Vision API directly, passing url of Cloudinary-hosted images.

Ido

December 18, 2017 14:04

Thanks for sharing Dmitry!

Extracting cyrillic text using OCR plugin

Comments

Didn't find what you were looking for?