On-screen guideline-based selective text recognition

US 8,515,185 B2
Filed: 11/25/2009
Issued: 08/20/2013
Est. Priority Date: 11/25/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for selectively recognizing text in a live video stream, comprising:

receiving a video frame from a camera in real time;

displaying a guideline overlaid on the video frame on a display device;

identifying a text region in the video frame associated with the guideline, the text region comprising text; and

converting the text in the text region into an editable symbolic form, the converting comprising;

identifying a candidate language for a line of text in the text region based at least in part on an orientation of the line of text;

using OCR functions associated with the candidate language to determine a plurality of candidate texts in the editable symbolic form;

displaying the plurality of candidate texts;

receiving a user selection of one of the plurality of candidate texts; and

identifying the selected candidate text as the converted text for the text region.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A live video stream captured by an on-device camera is displayed on a screen with an overlaid guideline. Video frames of the live video stream are analyzed for a video frame with acceptable quality. A text region is identified in the video frame approximate to the on-screen guideline and cropped from the video frame. The cropped image is transmitted to an optical character recognition (OCR) engine, which processes the cropped image and generates text in an editable symbolic form (the OCR'"'"'ed text). A confidence score is determined for the OCR'"'"'ed text and compared with a threshold value. If the confidence score exceeds the threshold value, the OCR'"'"'ed text is outputted.

69 Citations

View as Search Results

25 Claims

1. A computer-implemented method for selectively recognizing text in a live video stream, comprising:
- receiving a video frame from a camera in real time;
  
  displaying a guideline overlaid on the video frame on a display device;
  
  identifying a text region in the video frame associated with the guideline, the text region comprising text; and
  
  converting the text in the text region into an editable symbolic form, the converting comprising;
  
  identifying a candidate language for a line of text in the text region based at least in part on an orientation of the line of text;
  
  using OCR functions associated with the candidate language to determine a plurality of candidate texts in the editable symbolic form;
  
  displaying the plurality of candidate texts;
  
  receiving a user selection of one of the plurality of candidate texts; and
  
  identifying the selected candidate text as the converted text for the text region.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 25)
- - 2. The computer-implemented method of claim 1, wherein identifying the text region comprises identifying the text region in the video frame approximate to the guideline.
  - 3. The computer-implemented method of claim 1, further comprising:
    - cropping the video frame to produce a cropped image including the text region;
      
      wherein converting the text comprises;
      
      transmitting the cropped image to an OCR engine through a computer network; and
      
      receiving the text in the editable symbolic form from the OCR engine through the computer network.
  - 4. The computer-implemented method of claim 1, wherein converting the text comprises:
    - transmitting the video frame and location information about the text region in the video frame to an OCR engine through a computer network; and
      
      receiving the text in the editable symbolic form from the OCR engine through the computer network.
  - 5. The computer-implemented method of claim 1, wherein identifying the text region in the video frame comprises:
    - determining a skew angle of the text;
      
      correcting the skew angle by rotating at least a portion of the video frame including the text; and
      
      identifying the text region in the at least a portion of the video frame.
  - 6. The computer-implemented method of claim 5, wherein determining the skew angle of the text comprises:
    - calculating a plurality of projection profiles of a plurality of angles for the at least a portion of the video frame;
      
      identifying a horizontal projection profile in the plurality of projection profiles based on variances of the plurality of projection profiles; and
      
      determining the skew angle based on an angle of the horizontal projection profile.
  - 7. The computer-implemented method of claim 1, further comprising:
    - analyzing the video frame to determine a quality score that measures an image quality of the video frame;
      
      wherein identifying the text region comprises, responsive to the quality score exceeding a predetermined threshold value, identifying the text region in the video frame associated with the guideline.
  - 8. The computer-implemented method of claim 7, further comprising:
    - controlling the camera to improve image qualities of subsequent video frames based on the image quality.
  - 9. The computer-implemented method of claim 8, wherein controlling the camera comprises modifying, responsive to a poor sharpness of the video frame, at least one of the following:
    - a shutter speed, an aperture, and a focus of the camera.
  - 10. The computer-implemented method of claim 1, further comprising:
    - determining a motion of the camera based on an on-board accelerometer; and
      
      adjusting the camera based at least in part on the determined camera motion.
  - 11. The computer-implemented method of claim 10, wherein adjusting the camera comprises at least one of the following:
    - adjusting a focus of the camera, applying an image stabilization mechanism.
  - 12. The computer-implemented method of claim 1, wherein identifying the text region in the video frame further comprises:
    - detecting the text approximate to the guideline; and
      
      responsive to successfully detecting the text approximate to the guideline, identifying the text region in the video frame associated with the guideline.
  - 13. The computer-implemented method of claim 1, further comprising:
    - responsive to successfully identifying the text region in the video frame, displaying the guideline in a first color; and
      
      responsive to a failure to identify the text region in the video frame, displaying the guideline in a second color visually distinctive from the first color.
  - 14. The computer-implemented method of claim 1, further comprising:
    - displaying the text in the editable symbolic form along with texts converted from other video frames received from the camera for a user selection.
  - 25. The computer-implemented method of claim 1, wherein each of the candidate texts is associated with a confidence score quantifying a confidence of the candidate text matching the text in the text region.

15. A non-transitory computer-readable storage medium encoded with executable computer program code for selectively recognizing text in a live video stream, the computer program code comprising program code for:
- receiving a video frame from a camera in real time;
  
  displaying a guideline overlaid on the video frame on a display device;
  
  identifying a text region in the video frame associated with the guideline, the text region comprising text; and
  
  converting the text in the text region into an editable symbolic form, the converting comprising;
  
  identifying a candidate language for a line of text in the text region based at least in part on an orientation of the line of text;
  
  using OCR functions associated with the candidate language to determine a plurality of candidate texts in the editable symbolic form;
  
  displaying the plurality of candidate texts;
  
  receiving a user selection of one of the plurality of candidate texts; and
  
  identifying the selected candidate text as the converted text for the text region.

16. A computer system for selectively recognizing text in a live video stream, comprising:
- a computer-readable storage medium comprising executable computer program code for;
  
  a video User Interface (UI) module for receiving a video frame from a camera in real time and displaying a guideline overlaid on the video frame on a display device;
  
  a text region identification module for identifying a text region in the video frame associated with the guideline, the text region comprising text; and
  
  an OCR module for;
  
  converting the text in the text region into an editable symbolic form, the converting comprising;
  
  identifying a candidate language for a line of text in the text region based at least in part on an orientation of the line of text;
  
  using OCR functions associated with the candidate language to determine a plurality of candidate texts in the editable symbolic form;
  
  displaying the plurality of candidate texts;
  
  receiving a user selection of one of the plurality of candidate texts; and
  
  identifying the selected candidate text as the converted text for the text region.

17. A computer-implemented method for converting text in a series of received images into text in an editable symbolic form, comprising:
- receiving a series of images from a client, the series of images comprising a first image;
  
  processing the first image using OCR functions to generate text in the editable symbolic form;
  
  determining whether the generated text includes a spelling error;
  
  determining a confidence score for the generated text based on text generated for other images in the series of images received from the client, the confidence score being higher if the generated text does not include a spelling error than if the generated text includes a spelling error; and
  
  responsive to the confidence score exceeding a threshold value, transmitting the generated text to the client.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The computer-implemented method of claim 17, wherein the confidence score for a generated text matching at least one of the previously generated text for the other images in the series of images received from the client is higher than the confidence score for a generated text mismatching all of the previously generated text for the other images in the series of images received from the client.
  - 19. The computer-implemented method of claim 17, further comprising:
    - generating a predicted word using a portion of the generated text, wherein the confidence score for a generated text matching the associated predicted word is higher than the confidence score for a generated text mismatching the associated predicted word.
  - 20. The computer-implemented method of claim 17, further comprising:
    - identifying a candidate language for a line of text in the first image based at least in part on an orientation of the line of text,wherein processing the first image comprises processing the first image usingOCR functions associated with the candidate language to generate text in the editable symbolic form.
  - 21. The computer-implemented method of claim 20, wherein identifying the candidate language comprises identifying, responsive to the line of text being vertical, at least one of the following languages as the candidate language:
    - Chinese, Japanese, Korean.
  - 22. The computer-implemented method of claim 17, wherein processing the first image using OCR functions to generate text in the editable symbolic form comprises generating a plurality of candidate texts in the editable symbolic form, wherein determining the confidence score for the generated text comprises determining a confidence score for each of the plurality of candidate texts to quantify a confidence of the candidate text matching text in the first image, and wherein transmitting the generated text to the client comprises responsive to candidate scores of one or more of the plurality of candidate texts exceeding the threshold value, transmitting the one or more of the plurality of candidate texts to the client in response to the first image.

23. A non-transitory computer-readable storage medium encoded with executable computer program code for converting text in a series of received images into text in an editable symbolic form, the computer program code comprising program code for:
- receiving a series of images from a client, the series of images comprising a first image;
  
  identifying a candidate language for a line of text in the first image based at least in part on an orientation of the line of text;
  
  processing the first image using OCR functions associated with the candidate language to generate text in the editable symbolic form;
  
  determining a confidence score for the generated text based on text generated for other images in the series of images received from the client; and
  
  responsive to the confidence score exceeding a threshold value, transmitting the generated text to the client in response to the series of images.

24. A computer system for converting text in a series of received images into text in an editable symbolic form, comprising:
- a computer-readable storage medium comprising executable computer program code for;
  
  an OCR engine for;
  
  receiving a series of images from a client, the series of images comprising a first image, andprocessing the first image using OCR functions to generate a plurality of candidate texts in the editable symbolic form; and
  
  a confidence evaluation module for;
  
  determining a confidence score for each of the plurality of generated candidate texts based on text generated for other images in the series of images received from the client to quantify a confidence of the candidate text matching text in the first image, andtransmitting ones of the generated candidate texts to the client in response to the confidence scores of the ones of the candidate texts exceeding a threshold value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lee, Dar-Shyang, Chien, Lee-Feng, Hsieh, Aries, Ting, Pin, Wong, Kin
Primary Examiner(s)
STREGE, JOHN B

Application Number

US12/626,520
Publication Number

US 20110123115A1
Time in Patent Office

1,364 Days
Field of Search

382/185, 382/190, 382/229, 382/231, 382/254, 382/255, 382/282, 382/296, 382/301
US Class Current

382/229
CPC Class Codes

G06V 20/62   Text, e.g. of license plate...

G06V 30/10   Character recognition

G06V 30/133   Evaluation of quality of th...

G06V 30/142   using hand-held instruments...

G06V 30/1456   based on user interactions

H04N 23/68   for stable pick-up of the s...

H04N 23/6812   based on additional sensors...

H04N 23/682   Vibration or motion blur co...

On-screen guideline-based selective text recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

69 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

On-screen guideline-based selective text recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

69 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links