×

OCR of books by word recognition

  • US 20090263019A1
  • Filed: 04/16/2008
  • Published: 10/22/2009
  • Est. Priority Date: 04/16/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of image-to-text processing, comprising the steps of:

  • acquiring an image of a document having words written thereon;

    segmenting said image into areas, each area containing one of said words;

    using said areas, defining a dictionary containing reference images of said words, which comprise respective sequences of characters in respective fonts, along with respective codes corresponding to said words;

    comparing said areas to said reference images and classifying said words in said document that match said reference images as identified words and classifying said words that do not match any of said reference images as unidentified words;

    generating respective new codes for one or more of said unidentified words, and adding said one or more of said unidentified words and said respective new codes to said dictionary for use in comparing other said areas of said document; and

    outputting a coded version of said document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×