Detecting long documents in a live camera feed

US 10,257,375 B2
Filed: 06/14/2017
Issued: 04/09/2019
Est. Priority Date: 06/14/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing digital images of a document, comprising:

obtaining a first digital image of a document from a user;

determining a document type of the document in the first digital image based on a textual content of the document;

determining a font size of text in the document in the first digital image;

determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on;

a bounding rectangle with a largest area corresponding to an open contour; and

the bounding rectangle with the largest area touching one or more edges of the first digital image;

comparing the font size of text in the document with a font size range of the determined document type;

determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document;

generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document;

obtaining the set of digital images of the document from the user;

generating a second digital image of the document based on the obtained set of digital images; and

performing OCR on the second digital image of the document.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Aspects of the present disclosure provide methods and apparatuses for processing a digital image of a document, for example, to determine whether the document is a long document. An exemplary method generally includes obtaining a plurality of digital images of the document, determining a type of the document, loading one or more pre-defined metrics associated with the document based on the determined type of the document, determining one or more characteristics of the document based on one or more analyses performed on the plurality of digital images of the document, comparing the one or more characteristics of the document with the one or more pre-defined metrics, and determining the document to be a long document based, at least in part, on the comparison.

22 Citations

20 Claims

1. A computer-implemented method for processing digital images of a document, comprising:
- obtaining a first digital image of a document from a user;
  
  determining a document type of the document in the first digital image based on a textual content of the document;
  
  determining a font size of text in the document in the first digital image;
  
  determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on;
  
  a bounding rectangle with a largest area corresponding to an open contour; and
  
  the bounding rectangle with the largest area touching one or more edges of the first digital image;
  
  comparing the font size of text in the document with a font size range of the determined document type;
  
  determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document;
  
  generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document;
  
  obtaining the set of digital images of the document from the user;
  
  generating a second digital image of the document based on the obtained set of digital images; and
  
  performing OCR on the second digital image of the document.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein obtaining the set of digital images comprises obtaining the set of digital images via a live stream captured by a camera.
  - 3. The method of claim 1, wherein the comparison of the font size of the document with the font size range of the determined document type further comprises determining the font size of the text in the document is less than a lower bound of the font size range.
  - 4. The method of claim 1, wherein determining the at least one part of the document in the first digital image is out of bounds of the first digital image further comprises:
    - segmenting the first digital image into;
      
      a first set of pixels associated with a foreground of the first digital image; and
      
      a second set of pixels associated with a background of the first digital image;
      
      detecting a set of contours in the segmented first digital image;
      
      determining a set of bounding rectangles based on the set of contours;
      
      determining an area of each bounding rectangle in the set of bounding rectangles;
      
      determining the bounding rectangle of the set of bounding rectangles with the largest area; and
      
      determining, for each contour of the set of contours, whether the contour is an open contour or a closed contour, wherein;
      
      an open contour is a bounding rectangle with one or more sides outside of the segmented first digital image; and
      
      a closed contour is a bounding rectangle with all four sides inside of the segmented first digital image.
  - 5. The method of claim 1, wherein the determining the document type of the document in the first digital image based on a textual content of the document further comprises:
    - performing OCR on the first digital image to determine the textual content; and
      
      comparing the textual content to a set of text indicative of certain types of documents.
  - 6. The method of claim 1, where the determining a font size of text in the document in the first digital image further comprises:
    - drawing a bounding rectangle around each line of text in the document;
      
      determining a number of text blocks based on a number of bounding rectangles;
      
      determining an average height of the number of text blocks; and
      
      determining an estimated text size of the text in the document based on the average height of the number of text blocks.
  - 7. The method of claim 4, wherein the method further comprises determining that the document in the first digital image is not a long document based on:
    - a determination the document is not out of bounds, wherein the determination the document is not out of bounds further comprises;
      
      the bounding rectangle with the largest area corresponding to a closed contour; and
      
      the bounding rectangle with the largest area does not touch an edge of the first digital image; and
      
      a determination the font size of the text in the document is at a top or above the font size range.

8. An apparatus for processing digital images of a document, comprising:
- a processor; and
  
  a memory having instructions which, when executed by the processor, performs an operation for processing a digital image, the operation comprising;
  
  obtaining a first digital image of a document from a user;
  
  determining a document type of the document in the first digital image, based on a textual content of the document;
  
  determining a font size of text in the document in the first digital image;
  
  determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on;
  
  a bounding rectangle with a largest area corresponding to an open contour; and
  
  the bounding rectangle with the largest area touching one or more edges of the first digital image;
  
  comparing the font size of text in the document with a font size range of the determined document type;
  
  determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document;
  
  generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document;
  
  obtaining the set of digital images of the document from the user;
  
  generating a second digital image of the document based on the obtained set of digital images; and
  
  performing OCR on the second digital image of the document.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The apparatus of claim 8, wherein the comparison of the font size of text in the document with the font size range of the determined document type further comprises determining the font size of the text in the document is less than a lower bound of the font size range.
  - 10. The apparatus of claim 8, wherein the operation for determining the at least one part of the document in the first digital image is out of bounds of the first digital image further comprises:
    - segmenting the first digital image into;
      
      a first set of pixels associated with a foreground of the first digital image; and
      
      a second set of pixels associated with a background of the first digital image;
      
      detecting a set of contours in the segmented first digital image;
      
      determining a set of bounding rectangles based on the set of contours;
      
      determining an area of each bounding rectangle in the set of bounding rectangles;
      
      determining the bounding rectangle of the set of bounding rectangles with the largest area; and
      
      determining, for each contour of the set of contours, whether the contour is an open contour or a closed contour, wherein;
      
      an open contour is a bounding rectangle with one or more sides outside of the segmented first digital image; and
      
      a closed contour is a bounding rectangle with all four sides inside of the segmented first digital image.
  - 11. The apparatus of claim 8, wherein obtaining the set of digital images comprises obtaining the set of digital images via a live stream captured by a camera.
  - 12. The apparatus of claim 8, wherein the determining the document type of the document in the first digital image based on a textual content of the document further comprises:
    - performing OCR on the first digital image to determine the textual content; and
      
      comparing the textual content to a set of text indicative of certain types of documents.
  - 13. The apparatus of claim 8, wherein the determining a font size of text in the document in the first digital image further comprises:
    - drawing a bounding rectangle around each line of text in the document;
      
      determining a number of text blocks based on a number of bounding rectangles;
      
      determining an average height of the number of text blocks; and
      
      determining an estimated text size of the text in the document based on the average height of the number of text blocks.
  - 14. The apparatus of claim 10, wherein the operation further comprises determining that the document in the first digital image is not a long document based on:
    - a determination the document is not out of bounds, wherein the determination the document is not out of bounds further comprises;
      
      the bounding rectangle with the largest area corresponding to a closed contour; and
      
      the bounding rectangle with the largest area does not touch an edge of the first digital image; and
      
      a determination the font size of the text in the document is at a top or above the font size range.

15. A non-transitory computer-readable medium comprising instructions which, when executed on one or more processors, performs an operation for processing a digital image of a document, comprising:
- obtaining a first digital image of a document from a user;
  
  determining a document type of the document in the first digital image based on a textual content of the document;
  
  determining a font size of text in the document in the first digital image;
  
  determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on;
  
  a bounding rectangle with a largest area corresponding to an open contour; and
  
  the bounding rectangle with the largest area touching one or more edges of the first digital image;
  
  comparing the font size of text in the document with a font size range of the determined document type;
  
  determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document;
  
  generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document;
  
  obtaining the set of digital images of the document from the user;
  
  generating a second digital image of the document based on the obtained set of digital images; and
  
  performing OCR on the second digital image of the document.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable medium of claim 15, wherein obtaining the set of digital images comprises obtaining the set of digital images via a live stream captured by a camera.
  - 17. The non-transitory computer-readable medium of claim 15, wherein the comparison of the font size of text in the document with the font size range of the determined document type further comprises determining the font size of the text in the document is less than a lower bound of the font size range.
  - 18. The non-transitory computer-readable medium of claim 15, wherein the determining the document type of the document in the first digital image based on a textual content of the document further comprises:
    - performing OCR on the first digital image to determine the textual content; and
      
      comparing the textual content to a set of text indicative of certain types of documents.
  - 19. The non-transitory computer-readable medium of claim 15, wherein determining the at least one part of the document in the first digital image is out of bounds of the first digital image further comprises:
    - segmenting the first digital image into;
      
      a first set of pixels associated with a foreground of the first digital image; and
      
      a second set of pixels associated with a background of the first digital image;
      
      detecting a set of contours in the segmented first digital image;
      
      determining a set of bounding rectangles based on the set of contours;
      
      determining an area of each bounding rectangle in the set of bounding rectangles;
      
      determining the bounding rectangle of the set of bounding rectangles with the largest area; and
      
      determining, for each contour of the set of contours, whether the contour is an open contour or a closed contour, wherein;
      
      an open contour is a bounding rectangle with one or more sides outside of the segmented first digital image; and
      
      a closed contour is a bounding rectangle with all four sides inside of the segmented first digital image.
  - 20. The non-transitory computer-readable medium of claim 19, wherein the operation further comprises determining that the document in the first digital image is not a long document based on:
    - a determination the document is not out of bounds, wherein the determination the document is not out of bounds further comprises;
      
      the bounding rectangle with the largest area corresponding to a closed contour; and
      
      the bounding rectangle with the largest area does not touch an edge of the first digital image; and
      
      a determination the font size of the text in the document is at a top or above the font size range.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intuit, Inc.
Original Assignee
Intuit, Inc.
Inventors
Yellapragada, Vijay, Chiang, Peijun, Lee, Daniel, Hall, Jason, Soliwal, Shailesh
Primary Examiner(s)
Reinier, Barbara D

Application Number

US15/623,008
Publication Number

US 20180367688A1
Time in Patent Office

664 Days
Field of Search
US Class Current
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/141   using multiple overlapping ...

G06V 30/1452   based on positionally close...

G06V 30/153   using recognition of charac...

H04N 1/00713   Length

H04N 1/00737   using the scanning elements...

H04N 1/00748   Detecting edges, e.g. of a ...

H04N 1/00769   Comparing, e.g. with threshold

H04N 1/00771   Indicating or reporting, e....

H04N 1/19594   using a television camera o...

H04N 2101/00   Still video cameras

H04N 2201/0081   Image reader H04N2201/0091 ...

Detecting long documents in a live camera feed

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Detecting long documents in a live camera feed

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others