Method and system for identifying anchors for fields using optical character recognition data

US 9,396,540 B1
Filed: 04/03/2013
Issued: 07/19/2016
Est. Priority Date: 03/28/2012
Status: Active Grant

First Claim

Patent Images

1. A system for identifying anchors for fields using optical character recognition data, the system comprising:

one or more processors; and

a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to;

identify a first collection of characters comprising a first set of characters at a first position relative to a first field in a first document and a second set of characters at a second position relative to the first field in the first document, wherein the first set of characters is associated with a first word and the second set of characters is associated with a second word;

create a first anchor in the first document based on the first collection of characters, wherein the first anchor is at a third position relative to the first field in the first document, and wherein the first anchor is associated with a second field in the first document;

identify a second collection of characters comprising a third set of characters at a fourth position relative to a third field in a second document and a fourth set of characters at a fifth position relative to the third field in the second document, wherein the third set of characters is associated with a third word and the fourth set of characters is associated with a fourth word;

determine a location of a second anchor in the second document by calculating a vector based on the first, second, third and fourth sets of characters; and

identify a fourth field in the second document that corresponds to the second field in the first document based on the location of the second anchor in the second document.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Identifying anchors for fields using optical character recognition data is described. A collection of characters is identified. The collection of characters includes a first set of characters at a first position relative to a first field in a first document and a second set of characters at a second position relative to the first field in the first document. The first set of characters is associated with a first word, and the second set of characters is associated with a second word. An anchor is created based on the collection of characters, wherein the anchor is at a third relative position to the first field in the first document. A second field is identified in a second document by identifying the anchor in the second document.

Citations

20 Claims

1. A system for identifying anchors for fields using optical character recognition data, the system comprising:
- one or more processors; and
  
  a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to;
  
  identify a first collection of characters comprising a first set of characters at a first position relative to a first field in a first document and a second set of characters at a second position relative to the first field in the first document, wherein the first set of characters is associated with a first word and the second set of characters is associated with a second word;
  
  create a first anchor in the first document based on the first collection of characters, wherein the first anchor is at a third position relative to the first field in the first document, and wherein the first anchor is associated with a second field in the first document;
  
  identify a second collection of characters comprising a third set of characters at a fourth position relative to a third field in a second document and a fourth set of characters at a fifth position relative to the third field in the second document, wherein the third set of characters is associated with a third word and the fourth set of characters is associated with a fourth word;
  
  determine a location of a second anchor in the second document by calculating a vector based on the first, second, third and fourth sets of characters; and
  
  identify a fourth field in the second document that corresponds to the second field in the first document based on the location of the second anchor in the second document.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein at least one of the first set of characters comprise a first word and the second set of characters comprise a second word.
  - 3. The system of claim 1, wherein at least one of the first document and the second document comprise digitized optical character recognition data.
  - 4. The system of claim 1, wherein the second document is associated with a class in response to a comparison to classify documents similar to the first document.
  - 5. The system of claim 1, wherein the second document is associated with a template in response to a comparison to classify documents similar to the first document.
  - 6. The system of claim 1, wherein the instructions to determine the location of the second anchor in the second document based on the calculated vector comprises instructions to generate a score based on a degree of similarity between the second anchor in the second document and the first anchor in the first document, and comparing the score to a threshold.
  - 7. The system of claim 1, wherein the instructions to create the first anchor further comprises instructions to combine a graphic with the first set of characters and the second set of characters to create the first anchor.

8. A computer-implemented method for identifying anchors for fields using optical character recognition data, the method comprising:
- identifying a first collection of characters comprising a first set of characters at a first position relative to a first field in a first document and a second set of characters at a second position relative to the first field in the first document, wherein the first set of characters is associated with a first word and the second set of characters is associated with a second word;
  
  combining the first set of characters with the second set of characters to create a first anchor in the first document based on the first collection of characters, wherein the first anchor is at a third position relative to the first field in the first document, and wherein the first anchor is associated with a second field in the first document;
  
  identifying a second collection of characters comprising a third set of characters at a fourth position relative to a third field in a second document and a fourth set of characters at a fifth position relative to the third field in the second document, wherein the third set of characters is associated with a third word and the fourth set of characters is associated with a fourth word;
  
  determining a location of a second anchor in the second document by calculating a vector based on the first, second, third and fourth sets of characters; and
  
  identifying a fourth field in the second document that corresponds to the second field in the first document based on the location of the second anchor in the second document.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer-implemented method of claim 8, wherein at least one of the first set of characters comprise a first word and the second set of characters comprise a second word.
  - 10. The computer-implemented method of claim 8, wherein at least one of the first document and the second document comprise digitized optical character recognition data.
  - 11. The computer-implemented method of claim 8, wherein the second document is associated with a class in response to a comparison to classify documents similar to the first document.
  - 12. The computer-implemented method of claim 8, wherein the second document is associated with a template in response to a comparison to classify documents similar to the first document.
  - 13. The computer-implemented method of claim 8, wherein determining the location of the second anchor in the second document based on the calculated vector comprises generating a score based on a degree of similarity between the second anchor in the second document and the first anchor in the first document, and comparing the score to a threshold.
  - 14. The computer-implemented method of claim 8, wherein creating the first anchor further comprises combining a graphic with the first set of characters and the second set of characters to create the first anchor.

15. A computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to implement a method for identifying anchors for fields using optical character recognition data, the method comprising:
- identifying a first collection of characters comprising a first set of characters at a first position relative to a first field in a first document and a second set of characters at a second position relative to the first field in the first document, wherein the first set of characters is associated with a first word and the second set of characters is associated with a second word;
  
  combining the first set of characters with the second set of characters to create a first anchor in the first document based on the first collection of characters, wherein the first anchor is at a third position relative to the first field in the first document, and wherein the first anchor is associated with a second field in the first document;
  
  identifying a second collection of characters comprising a third set of characters at a fourth position relative to a third field in a second document and a fourth set of characters at a fifth position relative to the third field in the second document, wherein the third set of characters is associated with a third word and the fourth set of characters is associated with a fourth word;
  
  determining a location of a second anchor in the second document by calculating a vector based on the first, second, third and fourth sets of characters; and
  
  identifying a fourth field in the second document that corresponds to the second field in the first document based on the location of the second anchor in the second document.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer program product of claim 15, wherein at least one of the first set of characters comprise a first word and the second set of characters comprise a second word.
  - 17. The computer program product of claim 15, wherein at least one of the first document and the second document comprise digitized optical character recognition data.
  - 18. The computer program product of claim 15, wherein the second document is associated with a class in response to a comparison to classify documents similar to the first document.
  - 19. The computer program product of claim 15, wherein the second document is associated with a template in response to a comparison to classify documents similar to the first document.
  - 20. The computer program product of claim 15, wherein determining the location of the second anchor in the second document based on the calculated vector comprises generating a score based on a degree of similarity between the second anchor in the second document and the first anchor in the first document, and comparing the score to a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Open Text Corporation
Original Assignee
EMC Corporation (Dell Technologies Inc.)
Inventors
Sampson, Steven
Primary Examiner(s)
Nguyen, Phong

Application Number

US13/855,933
Time in Patent Office

1,203 Days
Field of Search

707/737, 707/749, 707/758
US Class Current

1/1
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 16/50   of still image data

G06T 7/74   involving reference images ...

G06V 30/196   using sequential comparison...

G06V 30/414   Extracting the geometrical ...

G06V 30/418   Document matching, e.g. of ...

Method and system for identifying anchors for fields using optical character recognition data

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for identifying anchors for fields using optical character recognition data

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links