Extracting cyrillic text using OCR plugin
Hi,
I'm planning to use OCR text detection plugin in my project and I managed to extract text by running 'cloudinary.v2.api.update' with { ocr: 'adv_ocr' } parameter.
It extracts english text well:
http://res.cloudinary.com/godovod/image/upload/v1513136310/ocr/eng_bw.png
https://godovod.ru/api/files/ocr?file=ocr/eng_bw
But it doesn't extract russian, there is no 'ru' among detected lanuages:
http://res.cloudinary.com/godovod/image/upload/v1513138221/ocr/ru_bw.jpg
https://godovod.ru/api/files/ocr?file=ocr/ru_bw
Google Vision API itself extracts russion text perfectly:
https://gyazo.com/c652dba9a166e9f9058a60552b2d3a24
Are there any restrictions or maybe some OCR trained data or settings are missing?
-
I've made some tests - in order to extract russian text we should explicitly set 'languageHints' in Google Vision API requests when using TEXT_DETECTION feature. If we use more complex DOCUMENT_TEXT_DETECTION feature the API parses russian text without any hints. So in my project i started to use Vision API directly, passing url of Cloudinary-hosted images.
1 -
Thanks for sharing Dmitry!
0
Post is closed for comments.
Comments
2 comments