AUTOMATED DOCUMENT RECOGNITION, IDENTIFICATION, AND DATA EXTRACTION

US 20150078671A1
Filed: 08/25/2014
Published: 03/19/2015
Est. Priority Date: 09/19/2013
Status: Active Grant

First Claim

Patent Images

1. A processor-implemented method for automated document recognition, identification and data extraction, the method comprising:

receiving a video stream associated with the document, the document being associated with a user;

detecting an image of the document in the video stream, the detecting including recognizing a shape corresponding to the document overall;

improving the detected image of the document in the video stream by adjusting colors, adjusting brightness, and removing blurring;

extracting the detected image of the document from the video stream, the image being a still image;

analyzing the extracted image using optical character recognition to produce image data, the image data including text zones, each of the text zones being associated with one or more distances to other text zones and one or more borders of the document, the one or more distances being determined using coordinates;

comparing the extracted image to one or more document templates using the image data;

determining a document template having a highest degree of coincidence with the extracted image using the comparison;

matching the text zones of the extracted image with text zones of the document template to determine a type of data in each text zone; and

structuring the data into a standard format to obtain structured data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for automated document recognition, identification, and data extraction is described herein. The method comprises receiving, by the processor, an image of a document associated with a user. The image is analyzed using optical character recognition to obtain image data, wherein the image data includes text zones. Based on the image data, the image is compared to one or more document templates. Based on the comparison, a document template having the highest degree of coincidence with the image is determined. The text zones of the image are associated with text zones of the document template to determine a type of data in each text zone. The data is structured into a standard format to obtain structured data.

Citations

25 Claims

1. A processor-implemented method for automated document recognition, identification and data extraction, the method comprising:
- receiving a video stream associated with the document, the document being associated with a user;
  
  detecting an image of the document in the video stream, the detecting including recognizing a shape corresponding to the document overall;
  
  improving the detected image of the document in the video stream by adjusting colors, adjusting brightness, and removing blurring;
  
  extracting the detected image of the document from the video stream, the image being a still image;
  
  analyzing the extracted image using optical character recognition to produce image data, the image data including text zones, each of the text zones being associated with one or more distances to other text zones and one or more borders of the document, the one or more distances being determined using coordinates;
  
  comparing the extracted image to one or more document templates using the image data;
  
  determining a document template having a highest degree of coincidence with the extracted image using the comparison;
  
  matching the text zones of the extracted image with text zones of the document template to determine a type of data in each text zone; and
  
  structuring the data into a standard format to obtain structured data.
- View Dependent Claims (2, 3, 4, 6, 7, 8, 10)
- - 2. The method of claim 1, wherein the document includes an identification document.
  - 3. The method of claim 1, further comprising presenting the structured data to the user.
  - 4. The method of claim 1, wherein each of the one or more document templates is associated with a type of the document, the type of the document including a driver'"'"'s license, a passport, a government issued identification document, and a student identification document.
  - 6. The method of claim 1, wherein the matching is based on the coordinates of the text zones.
  - 7. The method of claim 1, further comprising storing the structured data to a database.
  - 8. The method of claim 1, further comprising filling in one or more web forms using the structured data.
  - 10. The method of claim 1, wherein the shape includes four angles, each of the four angles being approximately 90 degrees.

5. (canceled)

9. (canceled)

11. (canceled)

12. A system for automated document recognition, identification and data extraction, the system comprising:
- a processor;
  
  a memory coupled to the processor, the memory storing instructions, the instructions being executable by the processor to perform a method, the method comprising;
  
  receiving a video stream associated with a document associated with a user,detecting an image of the document in the video stream, the detecting including recognizing a shape corresponding to the identification document overall,improving the detected image of the document in the video stream by adjusting colors, adjusting brightness, and removing blurring,extracting the detected image of the document from the video stream, the image being a still image,analyzing the extracted image using optical character recognition to produce image data, the image data including text zones, each of the text zones being associated with one or more distances to other text zones and one or more borders of the document, the one or more distances being determined using coordinates,comparing the extracted image to one or more document templates using the image data,determining a document template having a highest degree of coincidence with the extracted image using the comparison,matching the text zones of the image with text zones of the document template to determine a type of data in each text zone; and
  
  structuring the data into a standard format to obtain structured data; and
  
  a database communicatively coupled to the processor, the database storing the one or more document templates.
- View Dependent Claims (13, 15, 16, 17, 19)
- - 13. The system of claim 12, wherein each of the one or more document templates is associated with a type of the document, the type of the document including a driver'"'"'s license, a passport, a government issued identification document, and a student identification document.
  - 15. The system of claim 12, wherein the matching is based on the coordinates of the text zones.
  - 16. The system of claim 12, wherein the database further stores the structured data.
  - 17. The system of claim 12, wherein the method further comprises filling in one or more web forms using the structured data.
  - 19. The system of claim 12, wherein the shape includes four angles, each of the four angles being approximately 90 degrees.

14. (canceled)

18. (canceled)

20. A non-transitory computer-readable storage medium having embodied thereon a program, the program being executable by one or more processors to perform the a method, the method comprising:
- receiving a video stream associated with a document, the document being associated with a user;
  
  detecting an image of the document, the detecting including recognizing a shape corresponding to the document overall;
  
  improving the detected image of the document in the video stream by adjusting colors, adjusting brightness, and removing blurring;
  
  extracting the detected image of the document from the video stream, the image being a still image;
  
  analyzing the extracted image using optical character recognition to produce image data, the image data including text zones, each of the text zones being associated with one or more distances to other text zones and one or more borders of the document, the one or more distances being determined using coordinates;
  
  comparing the extracted image to one or more document templates using the image data;
  
  determining a document template having a highest degree of coincidence with the extracted image using the comparison;
  
  matching the text zones of the image with text zones of the document template to determine a type of data in each text zone; and
  
  structuring the data into a standard format to obtain structured data.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The non-transitory computer-readable storage medium of claim 20, wherein the document includes an identification document.
  - 22. The non-transitory computer-readable storage medium of claim 20, wherein the method further comprises presenting the structured data to the user.
  - 23. The non-transitory computer-readable storage medium of claim 20, wherein each of the one or more document templates is associated with a type of the document, the type of the document including a driver'"'"'s license, a passport, a government issued identification document, and a student identification document.
  - 24. The non-transitory computer-readable storage medium of claim 20, wherein the matching is based on the coordinates of the text zones.
  - 25. The non-transitory computer-readable storage medium of claim 20, the method further comprising storing the structured data to a database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Idchecker, Inc. (Mitek Systems Incorporated)
Original Assignee
Idchecker, Inc. (Mitek Systems Incorporated)
Inventors
van Deventer, Jorgen, Hagen, Michael, Mandak, Istvan

Granted Patent

US 8,995,774 B1
Time in Patent Office

Days
Field of Search
US Class Current

382/217
CPC Class Codes

G06V 30/412 Layout analysis of document...

G06V 30/414 Extracting the geometrical ...

AUTOMATED DOCUMENT RECOGNITION, IDENTIFICATION, AND DATA EXTRACTION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATED DOCUMENT RECOGNITION, IDENTIFICATION, AND DATA EXTRACTION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links