Low resolution OCR for camera acquired documents

US 20050259866A1
Filed: 05/20/2004
Published: 11/24/2005
Est. Priority Date: 05/20/2004
Status: Active Grant

First Claim

Patent Images

1. A system that facilitates optical character recognition (OCR) of low resolution symbols, comprising:

a segmentation component that facilitates segmentation of a symbol in an image; and

a recognition component that recognizes the symbol substantially simultaneously with segmentation thereof.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A global optimization framework for optical character recognition (OCR) of low-resolution photographed documents that combines a binarization-type process, segmentation, and recognition into a single process. The framework includes a machine learning approach trained on a large amount of data. A convolutional neural network can be employed to compute a classification function at multiple positions and take grey-level input which eliminates binarization. The framework utilizes preprocessing, layout analysis, character recognition, and word recognition to output high recognition rates. The framework also employs dynamic programming and language models to arrive at the desired output.

Citations

40 Claims

1. A system that facilitates optical character recognition (OCR) of low resolution symbols, comprising:
- a segmentation component that facilitates segmentation of a symbol in an image; and
  
  a recognition component that recognizes the symbol substantially simultaneously with segmentation thereof.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The system of claim 1, further comprising a machine learning classification algorithm that processes a grey-level input and facilitates recognizing the symbol by computing a classification function at a symbol location.
  - 3. The system of claim 1, further comprising a machine learning classification algorithm that is a convolutional neural network that processes the grey-level input and computes the classification function at multiple symbol locations.
  - 4. The system of claim 1, further comprising a line detection component that facilitates the detection of lines of text from a grey-level image of the symbols.
  - 5. The system of claim 1, further comprising at least one of language model and a programming model that facilitates interpretation of the symbol as a word or a part thereof.
  - 6. The system of claim 1, the recognition component recognizes both a symbol and a string of symbols, wherein the symbol or string of symbols is representative of a word.
  - 7. The system of claim 1, the recognition component uses at least one of a convolutional neural network, language model, and a dynamic programming algorithm.
  - 8. The system of claim 1, the recognition component builds a classifier invariant to at least one of different lighting conditions, fonts, symbol sizes, type of camera, angle, and focus.
  - 9. The system of claim 1, the recognition component predicts what character is represented by the symbol at a given location on the image.
  - 10. The system of claim 1, further comprising a filter that detects at least one of a gap between adjacent symbols and a gap between adjacent lines of symbols.
  - 11. The system of claim 10, the filter includes a predetermined threshold that is used during detection, which threshold is computed at least one of experimentally and automatically.
  - 12. The system of claim 1, the recognizer extracts a simple feature at a higher resolution, and converts the simple feature to a more complex feature at a coarser resolution.

13. A system that facilitates OCR of low resolution camera-acquired documents, comprising:
- a segmentation component that facilitates segmentation of a symbol in an image;
  
  a language model that facilitates processing a character in a string of characters; and
  
  a dynamic programming component that facilitates the recognition of the string of characters as a word.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The system of claim 13, the dynamic programming component determines which word is located at a given word bounding rectangle.
  - 15. The system of claim 13, the language model is a dictionary model that scans through an entire lexicon, evaluates a probability for each word of the lexicon, and outputs a most likely word.
  - 16. The system of claim 13, the language model is language neutral that produces the most likely interpretation of a sequence of character recognizer observations.
  - 17. The system of claim 13, the language model interleaves dynamic programming optimization with traversal of a lexicon to compute a most likely word.
  - 18. A computer readable medium having stored thereon computer executable instructions for carrying out the system of claim 13.
  - 19. A computer that employs the system of claim 13.
  - 20. The system of claim 13, further comprising a classifier that automatically makes an inference based on one or more observations associated with recognizing at least one of the character and the word.

21. A computer-readable medium having computer-executable instructions for a method of performing low resolution OCR of a camera-acquired document, the method comprising:
- receiving the photographed document having a plurality of imaged symbols;
  
  performing layout analysis to detect an associated arrangement of the imaged symbols on the document;
  
  deconstructing the associated arrangement into one or more sets of the imaged symbols by detecting spaces between the imaged symbols;
  
  segmenting the sets of imaged symbols into separate imaged symbols;
  
  computing a score for each imaged symbol at a horizontal position, at a higher horizontal resolution;
  
  combining the scores of each of the imaged symbols at the horizontal positions into a total score, which total score is used to determine a word; and
  
  outputting the word that is representative of one of the sets of imaged symbols.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
- - 22. The method of claim 21, further comprising the acts of:
    - extracting simple features of the separate symbols at a higher resolution; and
      
      converting the simple features into more complex features at a coarser resolution;
      
      wherein at least one of the acts of extracting and converting is performed by a convolutional neural network.
  - 23. The method of claim 21, further comprising locating the set of symbols in the associated arrangement, which associated arrangement is a line, using dynamic programming.
  - 24. The method of claim 21, the acts of segmenting, extracting, and converting are performed substantially simultaneously.
  - 25. The method of claim 21, further comprising training a machine learning algorithm to generate recognized symbols from the imaged symbols at given horizontal positions.
  - 26. The method of claim 25, further comprising generating a training set for the machine learning algorithm, which act of generating further comprises at least one of the acts of:
    - printing a collection of documents both on paper media and on electronic media; and
      
      matching a position of each character of the paper media with a position of a corresponding imaged symbol to generate a database of labeled character images.
  - 27. The method of claim 21, further comprising:
    - detecting a gap between the symbols with a gap filter;
      
      detecting the lines of symbols using a line filter; and
      
      defining a text region based on outcomes of both the gap filter and the line filter.
  - 28. The method of claim 21, further comprising performing connected components analysis on pixels associated with a gap between the symbols.

29. A method of performing low resolution OCR of a photographed document, comprising:
- preprocessing the photographed document to adjust for imperfections introduced into the photographed document;
  
  analyzing a layout of the document to determine lines of text;
  
  breaking the lines of text into individual words;
  
  indicating bounds for each of the individual words;
  
  recognizing characters in each of the individual words using a machine learning classification algorithm; and
  
  recognizing the individual words with a dynamic programming algorithm to determine which individual word is at a given location.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
- - 30. The method of claim 29, further comprising recognizing punctuation.
  - 31. The method of claim 29, further comprising preprocessing the photographed document with a whitebalancing algorithm to maximize contrast over regions of the document.
  - 32. The method of claim 29, the machine learning classification algorithm is one of a convolutional neural network and a support vector machine.
  - 33. The method of claim 29, further comprising at least one of the acts of:
    - arranging a dictionary into a structure that maximizes the amount of reused computation; and
      
      generating a dynamic programming table as the structure is traversed to determine an optimal assignment of an observation to a character of the individual word.
  - 34. The method of claim 33, the structure is a trie structure.
  - 35. The method of claim 29, further comprising analyzing a first word and a second word as a pair in order to recognize the first word.
  - 36. The method of claim 29, further comprising employing first and second language models, such that if use of the first language model fails to generate an output word, the second language model is automatically employed.
  - 37. The method of claim 29, further comprising detecting a gap between characters in a word according to a threshold, which threshold is computed automatically using boosting.
  - 38. The method of claim 29, further comprising detecting the lines of text by testing relative geometric relationships of connected components,
  - 39. The method of claim 38, further comprising generating statistics on an increasingly larger set of the connected components.
  - 40. The method of claim 29, further comprising training the machine learning classification algorithm with training images, which training includes at least one of:
    - randomly jittering the training images in an input window; and
      
      randomly altering brightness and contrast of the training images in the input window.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Simard, Patrice Y., Viola, Paul A., Rinker, James R., Jacobs, Charles E.

Granted Patent

US 7,499,588 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/157
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/153   using recognition of charac...

G06V 30/18029   filtering with Haar-like su...

G06V 30/18057   Integrating the filters int...

G06V 30/268   Lexical context

G06V 30/414   Extracting the geometrical ...

Low resolution OCR for camera acquired documents

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Low resolution OCR for camera acquired documents

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links