Blue Theme Orange Theme Green Theme Red Theme
 
Discover the top 5 tips for understanding .NET Interop
Home | Forums | Videos | Photos | Blogs | Beginners | Advertise with Us
 | Consulting  
Submit an Article Submit a Blog 
 Jump to
Skip Navigation Links
TechnologyExpand Technology
WebsiteExpand Website
Mindcracker MVP Summit 2012
Search :       Advanced Search »
Home » Office and VB.NET » OCR functionality through MODI for extracting text information from Image file in VB.NET

OCR functionality through MODI for extracting text information from Image file in VB.NET

Article show that how to extract text and layout information from image file like MDI and TIFF file format

Author Rank :
Page Views : 6878
Downloads : 0
Rating :
 Rate it
Level : Beginner
   Print Read/Post comments Post a comment  Similar Articles  
   Email to a friend  Bookmark  Author's other articles  
 
Discover the top 5 tips for understanding .NET Interop
Become a Sponsor
 Tag Cloud
 Latest Jobs
More ... 
 Latest Interview Questions
More ... 


Simply OCR means Optical Character Recognition. We can extract text and layout information from image file like MDI and TIFF file format. When one scan a paper page into a computer, it produces just an image file, a photo of the page. The computer cannot understand the letters on the page; you would use OCR functionality to convert it into a text or word processor file, so that you can read text.

it can be performed by Microsoft Office Document Imaging Object Model,for it we are need to use  the MODI Library in a Development Project.The MODI object model consists of the following objects:
 

               Document object:     Represents an ordered collection of pages (images).

               Image object:           Represents a single page of a document.

               Layout object:          Represents the results of optical character recognition (OCR) on a page.

               MiDocSearch object:  Exposes document search functionality.

               Viewer control:          Is an ActiveX control that displays the pages of a document

  Example for extracting text from tif file:
 

        Dim strWordInfo As String

        Dim docs As New MODI.Document

        docs.Create("C:\test.tif")

     
       Dim Success As Integer = Analyse(docs)

        If Success Then

            Dim j As Integer

            For j = 0 To docs.Images.Count - 1

                strWordInfo = strWordInfo & " " & docs.Images(0).Layout.Text

            Next

            strWordInfo = strWordInfo.Replace("'", "''").ToString()

        End If

       Function Analyse(ByVal Doc As MODI.Document) As Integer

            If Doc Is Nothing Then

               Exit Function

            End If

        Try

            '  MODI call for OCR

            ' _MODIDocument.OCR(_MODIParameters.Language, '_MODIParameters.WithAutoRotation,              _MODIParameters.WithStraightenImage)

            Doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, True, True)

            Analyse = 1

        Catch ex As Exception

            'MessageBox.Show("OCR was successful but no text was recognized")                 

            Analyse = 0

        End Try

    End Function

Note : The most important point here to performing all tasks is to add a reference to " Microsoft Office Document Imaging Type Library", In case of

 Microsoft Outlook 2003, Add "
Microsoft Office Document Imaging 11.0 Type Library "
 Microsoft Outlook 2007, Add "
Microsoft Office Document Imaging 12.0 Type Library "

Comment Request!
Thank you for reading this post. Please post your feedback, question, or comments about this post Here.
Login to add your contents and source code to this article
 [Top] Rate this article
 
 About the author
 
Hirendra Sisodiya
My main area of experience has been application development. I have worked primarily in the domain of banking and financial services etc. My technological forte is Microsoft Technologies especially VB 6.0, Dot Net (Visual Studio 2005, 2008 and 2010) and Microsoft SQL Server 2005 and 2008.
Looking for C# Consulting?
C# Consulting is founded in 2002 by the founders of C# Corner. Unlike a traditional consulting company, our consultants are well-known experts in .NET and many of them are MVPs, authors, and trainers. We specialize in Microsoft .NET development and utilize Agile Development and Extreme Programming practices to provide fast pace quick turnaround results. Our software development model is a mix of Agile Development, traditional SDLC, and Waterfall models.
Click here to learn more about C# Consulting.
 
Introducing MaxV - one click. infinite control. Hyper-V Hosting from MaximumASP.
Finally – a virtual platform that delivers next-generation Windows Server 2008 Hyper-V virtualization technology from a managed hosting partner you can truly depend on. Visit www.maximumasp.com/max for a FREE 30 day trial. Hurry offer ends soon. Climb aboard the MaxV platform and take advantage of High Availability, Intelligent Monitoring, Recurrent Backups, and Scalability – with no hassle or hidden fees. As a managed hosting partner focused solely on Microsoft technologies since 2000, MaximumASP is uniquely qualified to provide the superior support that our business is built on. Unparalleled expertise with Microsoft technologies lead to working directly with Microsoft as first to offer IIS 7 and SQL 2008 betas in a hosted environment; partnering in the Go Live Program for Hyper-V; and product co-launches built on WS 2008 with Hyper-V technology.
Dynamic PDF
ceTE software specializes in components for dynamic PDF generation and manipulation. The DynamicPDF™ product line allows you to dynamically generate PDF documents, merge PDF documents and new content to existing PDF documents from within your applications.
Discover the top 5 tips for understanding .NET
Ricky Leeks presents the top 5 tips for understanding .NET Interoperability. Learn more.
Nevron Chart for .NET 2010.1 Now Available
The leading .NET charting control now features PDF, Flash and Silverlight export, visualization of large datasets and more. Deliver true charting functionality to your BI, Scorecard, Presentation or Scientific apps. Download evaluation now.
ASP.NET 4 Hosting
Get 2 Months Free of ASP.NET Hosting for Only $4.95/month! Receive FREE MS SQL and MySQL Databases Including ASP.NET 4/3.5, MVC 3.0, Silverlight 4, Windows 2008/IIS 7.0 Plus FREE IIS 7 Modules. Host UNLIMITED ASP.NET Web Sites – Click Here!
 
 Post a Feedback, Comment, or Question about this article
Subject:
Comment:
Team Foundation Server Hosting
Become a Sponsor
 Comments
IO Error by Imran On March 4, 2010
At below line I am getting IO Error, any solution??
docs.Create("C:\test.tif")
Reply | Email | Modify 
Re: IO Error by Hirendra On March 4, 2010

please check these things:

1: File should be exist on given path

2. File should be present with valid format

Reply | Email | Modify 
Re: Re: IO Error by asheesh On August 4, 2010
Hi when i am extracting text from TIF, junk characters are coming please sujest what to do.

I wil be very much thank full to you.

Regards
Asheesh Panwar
Reply | Email | Modify 
Re: Re: Re: IO Error by Hirendra On August 4, 2010
Hello Asheesh

i think we can do anything in that..but you can write your own function for replacing these types of junk characters from blank space as possible..

thanks
Reply | Email | Modify 
Re: Re: Re: Re: IO Error by asheesh On August 5, 2010
Hi Hirendra,

Thanks for the reply i also did a lot of R&D but not success to remove all the junk character. I think i am not working in a correct way will you please suggest me to choose the correct way.

i will be very much thankful to you.

Regards
Asheesh Panwar
Reply | Email | Modify 
Re: Re: Re: Re: Re: IO Error by Hirendra On August 5, 2010
can you send me your code and that tiff file...

thanks
Reply | Email | Modify 
Re: Re: Re: Re: Re: Re: IO Error by asheesh On August 6, 2010
hi hitender

Thanx for reply but how i can attach my course code and sample tiff file here.

Regards
Asheesh Panwar
Reply | Email | Modify 
Re: Re: Re: Re: Re: Re: Re: IO Error by Hirendra On August 6, 2010
i have send my email id
please Check your message box of this site

thanks
Reply | Email | Modify 
Team Foundation Server Hosting
 © 2012  contents copyright of their authors. Rest everything copyright Mindcracker. All rights reserved.