Method and system for character recognition

US 8,953,886 B2
Filed: 08/08/2013
Issued: 02/10/2015
Est. Priority Date: 12/03/2004
Status: Active Grant

First Claim

Patent Images

1. An article of manufacture comprising a non-transitory computer-readable medium with instructions encoded thereon, the instructions configured to cause one or more processors to perform a method comprising:

obtaining an image based on a document capture process performed on a rendered document;

identifying a portion of the image, the portion comprising a sequence of text units;

segmenting the portion of the image into a sequence of segmented sub-images, each segmented sub-image comprising a single text unit of the sequence of text units;

for each segmented sub-image of the sequence of segmented sub-images;

determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of a stored sub-image; and

based on determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image, assigning to the segmented sub-image a text unit identity that is associated with the stored sub-image;

generating a representation of the portion of the image, based on the assigned text unit identities; and

identifying the sequence of segmented sub-images, based on the generated representation.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Character recognition is described. In one embodiment, it may use matched sequences rather than character shape to determine a computer-legible result.

1177 Citations

21 Claims

1. An article of manufacture comprising a non-transitory computer-readable medium with instructions encoded thereon, the instructions configured to cause one or more processors to perform a method comprising:
- obtaining an image based on a document capture process performed on a rendered document;
  
  identifying a portion of the image, the portion comprising a sequence of text units;
  
  segmenting the portion of the image into a sequence of segmented sub-images, each segmented sub-image comprising a single text unit of the sequence of text units;
  
  for each segmented sub-image of the sequence of segmented sub-images;
  
  determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of a stored sub-image; and
  
  based on determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image, assigning to the segmented sub-image a text unit identity that is associated with the stored sub-image;
  
  generating a representation of the portion of the image, based on the assigned text unit identities; and
  
  identifying the sequence of segmented sub-images, based on the generated representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The article of manufacture of claim 1, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - identifying a likelihood that the segmented sub-image corresponds to the stored sub-image; and
      
      determining that the likelihood meets a predetermined threshold.
  - 3. The article of manufacture of claim 1, wherein the stored sub-image is stored by adding the stored sub-image to a template of sub-images.
  - 4. The article of manufacture of claim 1, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - identifying a difference between the segmented sub-image and the stored sub-image;
      
      identifying a pattern, based on the difference; and
      
      determining that a size of the pattern meets a predetermined size threshold.
  - 5. The article of manufacture of claim 1, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - decomposing the segmented sub-image into a first set of vectors;
      
      decomposing the stored sub-image into a second set of vectors; and
      
      determining that the first set of vectors and the second set of vectors meet a predetermined similarity threshold.
  - 6. The article of manufacture of claim 1, wherein the image comprises an image of one or more words from the rendered document, each of the words comprising one or more text units.
  - 7. The article of manufacture of claim 1, wherein segmenting a portion of the image into multiple segmented sub-images comprises identifying space between the sub-images.

8. A system, comprising:
- one or more data processing apparatus; and
  
  a computer-readable storage device including instructions executable by the data processing apparatus and upon such execution cause the data processing apparatus to perform operations comprising;
  
  obtaining an image based on a document capture process performed on a rendered document;
  
  identifying a portion of the image, the portion comprising a sequence of text units;
  
  segmenting the portion of the image into a sequence of segmented sub-images, each segmented sub-image comprising a single text unit of the sequence of text units;
  
  for each segmented sub-image of the sequence of segmented sub-images;
  
  determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of a stored sub-image; and
  
  based on determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image, assigning to the segmented sub-image a text unit identity that is associated with the stored sub-image;
  
  generating a representation of the portion of the image, based on the assigned text unit identities; and
  
  identifying the sequence of segmented sub-images, based on the generated representation.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - identifying a likelihood that the segmented sub-image corresponds to the stored sub-image; and
      
      determining that the likelihood meets a predetermined threshold.
  - 10. The system of claim 8, wherein the stored sub-image is stored by adding the stored sub-image to a template of sub-images.
  - 11. The system of claim 8, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - identifying a difference between the segmented sub-image and the stored sub-image;
      
      identifying a pattern, based on the difference; and
      
      determining that a size of the pattern meets a predetermined size threshold.
  - 12. The system of claim 8, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - decomposing the segmented sub-image into a first set of vectors;
      
      decomposing the stored sub-image into a second set of vectors; and
      
      determining that the first set of vectors and the second set of vectors meet a predetermined similarity threshold.
  - 13. The system of claim 8, wherein the image comprises an image of one or more words from the rendered document, each of the words comprising one or more text units.
  - 14. The system of claim 8, wherein segmenting a portion of the image into multiple segmented sub-images comprises identifying space between the sub-images.

15. A computer-implemented method, comprising:
- obtaining an image based on a document capture process performed on a rendered document;
  
  identifying a portion of the image, the portion comprising a sequence of text units;
  
  segmenting the portion of the image into a sequence of segmented sub-images, each segmented sub-image comprising a single text unit of the sequence of text units;
  
  for each segmented sub-image of the sequence of segmented sub-images;
  
  determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of a stored sub-image; and
  
  based on determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image, assigning to the segmented sub-image a text unit identity that is associated with the stored sub-image;
  
  generating a representation of the portion of the image, based on the assigned text unit identities; and
  
  identifying the sequence of segmented sub-images, based on the generated representation.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer-implemented method of claim 15, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - identifying a likelihood that the segmented sub-image corresponds to the stored sub-image; and
      
      determining that the likelihood meets a predetermined threshold.
  - 17. The computer-implemented method of claim 15, wherein the stored sub-image is stored by adding the stored sub-image to a template of sub-images.
  - 18. The computer-implemented method of claim 15, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - identifying a difference between the segmented sub-image and the stored sub-image;
      
      identifying a pattern, based on the difference; and
      
      determining that a size of the pattern meets a predetermined size threshold.
  - 19. The computer-implemented method of claim 15, wherein determining that one or more features of the segmented sub-image are classified as being similar to one or more corresponding features of the stored sub-image comprises:
    - decomposing the segmented sub-image into a first set of vectors;
      
      decomposing the stored sub-image into a second set of vectors; and
      
      determining that the first set of vectors and the second set of vectors meet a predetermined similarity threshold.
  - 20. The computer-implemented method of claim 15, wherein the image comprises an image of one or more words from the rendered document, each of the words comprising one or more text units.
  - 21. The computer-implemented method of claim 15, wherein segmenting a portion of the image into multiple segmented sub-images comprises identifying space between the sub-images.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kyocera Corporation
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
King, Martin T., Grover, Dale L., Kushler, Clifford A., Stafford-Fraser, James Quentin
Primary Examiner(s)
Seth, Manav

Application Number

US13/961,934
Publication Number

US 20140169675A1
Time in Patent Office

551 Days
Field of Search

382/135, 382137-140, 382181-231, 382/321, 345634-636, 345468-471
US Class Current

382/177
CPC Class Codes

G06F 16/5846   using extracted text

G06F 16/93   Document management systems

G06V 20/62   Text, e.g. of license plate...

G06V 30/10   Character recognition

G06V 30/153   using recognition of charac...

G06V 30/158   using character size, text ...

G06V 30/224   of printed characters havin...

G06V 30/413   Classification of content, ...

G06V 30/414   Extracting the geometrical ...

Method and system for character recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

1177 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for character recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

1177 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links