In this article, we
will learn about how we can read or extract text from an image, irrespective of
whether it is handwritten or printed.
In order to read
the text, two things come into the picture. The first one is Computer Vision and
the second one is NLP, which is short for Natural Language
Processing. Computer vision helps us to read the text and then NLP is used
to make sense of that identified text. In this article, I’ll mention
specifically about text extraction part.
How
Computer Vision Performs Text Extraction
To execute this text
extraction task, Computer Vision provides us with two APIs:
- OCR
API
- Read
API
OCR API, works with many
languages and is very well suited for relatively small text but if you have so
much text in any image or say text-dominated image, then Read API is
your option.
OCR API provides
information in the form of Regions, Lines, and Words. The region in the given
image is the area that contains the text. So, the output hierarchy would be -
Region, Lines of text in each region, and then Words in each line.
Read API, works very well with an
image, that is highly loaded with text. The best example of a text-dominated
image is any scanned or printed document. Here output hierarchy is in the form
of Pages, Lines, and Words. As this API deals with a high number of lines and
words, it works asynchronously. Hence do not block our application until the
whole document is read. Whereas OCR API works in a synchronous
fashion.
Here is the table
depicting, when to use what:
OCR API |
Read API |
Good for relatively
small text |
Good for
text-dominated image, i.e Scanned Docs |
Output hierarchy would
be Regions >> Lines >> Words |
Output hierarchy would
be Pages >> Lines >> Words |
Works in a synchronous
manner |
Works in an
asynchronous manner. |
Do watch out my recorded video on my YouTube channel named Shweta Lodha for the demo and code walkthrough.
Comments
Post a Comment