OCR Using Microsoft Document Imaging API’s in C#.Net


OCR: Optical Character Recognition

OCR means extracting the content from images i.e. getting the text from an Image file or scanned PDF, Word and etc. Type of documents.

Steps to implement OCR using Microsoft Document Imaging 

  1. To use Microsoft Document Imaging API we need install any of these software’s Microsoft Office 2007 or SharePoint Designer 2007. In these two software’s SharePoint Designer 2007 is a free software.
  2. Create a console application using visual studio
  3. Add reference of a Microsoft Office Document Imaging 12.0 Type Library which is COM object
  4. Write the below code                                                                                                                                                    OCR1
  5. Execute the application, It will prompt for Image file path                                              OCR2
  6. Actual image                                                                                                                         OCR
  7. Type the Image file path and press Enter                                                                            OCR3
  8. You can see the output of the text from the image                                                            OCR4

Note: The data which has been extracted from images might not be accurate because if the character in the image is blur the MODI will be not able to recognize the exact character and it will consider it as any other default character. So it is completely dependent upon the quality of the image.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s