Detecting long documents in a live camera feed
First Claim
1. A computer-implemented method for processing digital images of a document, comprising:
- obtaining a first digital image of a document from a user;
determining a document type of the document in the first digital image based on a textual content of the document;
determining a font size of text in the document in the first digital image;
determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on;
a bounding rectangle with a largest area corresponding to an open contour; and
the bounding rectangle with the largest area touching one or more edges of the first digital image;
comparing the font size of text in the document with a font size range of the determined document type;
determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document;
generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document;
obtaining the set of digital images of the document from the user;
generating a second digital image of the document based on the obtained set of digital images; and
performing OCR on the second digital image of the document.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the present disclosure provide methods and apparatuses for processing a digital image of a document, for example, to determine whether the document is a long document. An exemplary method generally includes obtaining a plurality of digital images of the document, determining a type of the document, loading one or more pre-defined metrics associated with the document based on the determined type of the document, determining one or more characteristics of the document based on one or more analyses performed on the plurality of digital images of the document, comparing the one or more characteristics of the document with the one or more pre-defined metrics, and determining the document to be a long document based, at least in part, on the comparison.
22 Citations
20 Claims
-
1. A computer-implemented method for processing digital images of a document, comprising:
-
obtaining a first digital image of a document from a user; determining a document type of the document in the first digital image based on a textual content of the document; determining a font size of text in the document in the first digital image; determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on; a bounding rectangle with a largest area corresponding to an open contour; and the bounding rectangle with the largest area touching one or more edges of the first digital image; comparing the font size of text in the document with a font size range of the determined document type; determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document; generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document; obtaining the set of digital images of the document from the user; generating a second digital image of the document based on the obtained set of digital images; and performing OCR on the second digital image of the document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for processing digital images of a document, comprising:
-
a processor; and a memory having instructions which, when executed by the processor, performs an operation for processing a digital image, the operation comprising; obtaining a first digital image of a document from a user; determining a document type of the document in the first digital image, based on a textual content of the document; determining a font size of text in the document in the first digital image; determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on; a bounding rectangle with a largest area corresponding to an open contour; and the bounding rectangle with the largest area touching one or more edges of the first digital image; comparing the font size of text in the document with a font size range of the determined document type; determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document; generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document; obtaining the set of digital images of the document from the user; generating a second digital image of the document based on the obtained set of digital images; and performing OCR on the second digital image of the document. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium comprising instructions which, when executed on one or more processors, performs an operation for processing a digital image of a document, comprising:
-
obtaining a first digital image of a document from a user; determining a document type of the document in the first digital image based on a textual content of the document; determining a font size of text in the document in the first digital image; determining that at least one part of the document in the first digital image is out of bounds of the first digital image based on; a bounding rectangle with a largest area corresponding to an open contour; and the bounding rectangle with the largest area touching one or more edges of the first digital image; comparing the font size of text in the document with a font size range of the determined document type; determining, based on the comparison and the determination that the at least one part of the document is out of bounds of the first digital image, that the document in the first digital image is a long document; generating, based on the determination that the document in the first digital image is a long document, an alert for the user to capture a set of digital images of the document, wherein each digital image of the set of digital images of the document is of a different portion of the document; obtaining the set of digital images of the document from the user; generating a second digital image of the document based on the obtained set of digital images; and performing OCR on the second digital image of the document. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification