System and methods for extracting document images from images featuring multiple documents

US 10,621,676 B2
Filed: 02/02/2016
Issued: 04/14/2020
Est. Priority Date: 02/04/2015
Status: Active Grant

First Claim

Patent Images

1. A method for extracting document images from images featuring multiple documents, comprising:

receiving a multiple-document image including a plurality of document images, wherein each document image is associated with a document;

extracting a plurality of visual identifiers from the multiple-document image, wherein each visual identifier is text indicating information related to one of the plurality of document images;

analyzing the plurality of visual identifiers to identify each document image, wherein each document image is identified based on at least one threshold visual identifier requirement representing a portion of the plurality of visual identifiers that need to be included in each of the identified document image;

identifying, for each identified document image that meets the at least one threshold visual identifier requirement, a boundary based on the analysis, the boundary occupying a textless border around the respective identified document image and enclosing all of the plurality of visual identifiers that need to be included within the document image as represented by the at least one threshold visual identifier requirement;

determining, based on the analysis, an image area of each document image, wherein the image area of the document image is defined by the boundary; and

extracting each document image based on its image area, wherein extracting each document image further comprises generating a file including the document image.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for extracting document images from images featuring multiple documents are presented. The method includes receiving a multiple-document image including a plurality of document images, wherein each document image is associated with a document; extracting a plurality of visual identifiers from the multiple-document image, wherein each visual identifier is associated with one of the plurality of document images; analyzing the plurality of visual identifiers to identify each document image; determining, based on the analysis, an image area of each document image; extracting each document image based on its image area.

Citations

17 Claims

1. A method for extracting document images from images featuring multiple documents, comprising:
- receiving a multiple-document image including a plurality of document images, wherein each document image is associated with a document;
  
  extracting a plurality of visual identifiers from the multiple-document image, wherein each visual identifier is text indicating information related to one of the plurality of document images;
  
  analyzing the plurality of visual identifiers to identify each document image, wherein each document image is identified based on at least one threshold visual identifier requirement representing a portion of the plurality of visual identifiers that need to be included in each of the identified document image;
  
  identifying, for each identified document image that meets the at least one threshold visual identifier requirement, a boundary based on the analysis, the boundary occupying a textless border around the respective identified document image and enclosing all of the plurality of visual identifiers that need to be included within the document image as represented by the at least one threshold visual identifier requirement;
  
  determining, based on the analysis, an image area of each document image, wherein the image area of the document image is defined by the boundary; and
  
  extracting each document image based on its image area, wherein extracting each document image further comprises generating a file including the document image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein analyzing the plurality of visual identifiers further comprises:
    - executing at least one machine imaging process to identify metadata associated with each visual identifier.
  - 3. The method of claim 1, wherein each boundary is identified based on portions of the multiple-document image in which no text appears.
  - 4. The method of claim 1, further comprising:
    - generating a plurality of files, each file including one of the extracted document images.
  - 5. The method of claim 1, wherein extracting each document image further comprises at least one of:
    - cutting the document image, copying the document image, and cropping the document image.
  - 6. The method of claim 1, wherein the visual identifier threshold is any of:
    - a number of visual identifiers, a particular visual identifier, and a combination of visual identifiers.
  - 7. The method of claim 6, further comprising:
    - determining, for each document image, whether any required visual identifiers have not been extracted; and
      
      upon determining that at least one required visual identifier has not been extracted, retrieving the at least one required visual identifier.
  - 8. The method of claim 7, further comprising:
    - determining, for each document image, an eligibility for a potential value-added tax (VAT) refund based on the visual identifiers.
  - 9. A non-transitory computer readable medium having stored thereon instructions for causing one or more processing units to execute the method according to claim 1.

10. A system for extracting document images from images featuring multiple documents, comprising:
- a processing circuitry; and
  
  a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to;
  
  receive a multiple-document image including a plurality of document images, wherein each document image is associated with a document;
  
  extract a plurality of visual identifiers from the multiple-document image, wherein each visual identifier is text indicating information related to one of the plurality of document images;
  
  analyze the plurality of visual identifiers to identify each document image, wherein each document image is identified based on at least one threshold visual identifier requirement representing a portion of the plurality of visual identifiers that need to be included in each of the identified document image;
  
  identify, for each identified document image that meets the at least one threshold visual identifier requirement, a boundary based on the analysis, the boundary occupying a textless border around the respective identified document image and enclosing all visual identifiers that need to be included within the document image as represented by the at least one threshold visual identifier requirement;
  
  determine, based on the analysis, an image area of each document image, wherein the image area of the document image is defined by the boundary; and
  
  extract each document image based on its image area, wherein extracting each document image further comprises generating a file including the document image.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein the system is further configured to:
    - execute at least one machine imaging process to identify metadata associated with each visual identifier.
  - 12. The system of claim 10, wherein each boundary is identified based on portions of the multiple-document image in which no text appears.
  - 13. The system of claim 10, wherein the system is further configured to:
    - generate a plurality of files, each file including one of the extracted document images.
  - 14. The system of claim 10, wherein the system is further configured to perform at least one of:
    - cut the document image, copy the document image, and crop the document image.
  - 15. The system of claim 10, wherein the visual identifier threshold is any of:
    - a number of visual identifiers, a particular visual identifier, and a combination of visual identifiers.
  - 16. The system of claim 15, wherein the system is further configured to:
    - determine, for each document image, whether any required visual identifiers have not been extracted; and
      
      retrieve the at least one required visual identifier, upon determining that at least one required visual identifier has not been extracted.
  - 17. The system of claim 16, wherein the system is further configured to:
    - determine, for each document image, an eligibility for a potential value-added tax (VAT) refund based on the visual identifiers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vatbox, Ltd.
Original Assignee
Vatbox, Ltd.
Inventors
Saft, Isaac, Guzman, Noam
Primary Examiner(s)
Youssef, Menatoallah

Application Number

US15/013,284
Publication Number

US 20160225101A1
Time in Patent Office

1,533 Days
Field of Search
US Class Current
CPC Class Codes

G06Q 40/123   Tax preparation or submission

G06V 10/25   Determination of region of ...

G06V 20/62   Text, e.g. of license plate...

G06V 30/40   Document-oriented image-bas...

System and methods for extracting document images from images featuring multiple documents

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and methods for extracting document images from images featuring multiple documents

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links