OCR: Optical Character Recognition
OCR means extracting the content from images i.e. getting the text from an Image file or scanned PDF, Word and etc. Type of documents.
Steps to implement OCR using Microsoft Document Imaging
- To use Microsoft Document Imaging API we need install any of these software’s Microsoft Office 2007 or SharePoint Designer 2007. In these two software’s SharePoint Designer 2007 is a free software.
- Create a console application using visual studio
- Add reference of a Microsoft Office Document Imaging 12.0 Type Library which is COM object
- Write the below code
- Execute the application, It will prompt for Image file path
- Actual image
- Type the Image file path and press Enter
- You can see the output of the text from the image
Note: The data which has been extracted from images might not be accurate because if the character in the image is blur the MODI will be not able to recognize the exact character and it will consider it as any other default character. So it is completely dependent upon the quality of the image.