System and method for global identification in a collection of documents

US 10,452,907 B2
Filed: 06/11/2018
Issued: 10/22/2019
Est. Priority Date: 03/19/2015
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying matching pairs of one or more information objects corresponding to a real world object, one information object from a document and at least one information object from a document storage for a combination of global identification patterns that exist in the document and in the document storage;

ascertaining consistency of the matching pairs and determining which of the one or more information objects in the document are suitable for merging into the document storage; and

adding the one or more information objects from the document to the document storage to associate information objects corresponding to the real world object.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for machine-based identification of objects extracted from text documents in natural language are disclosed. An example method may comprise: identifying matching pairs of one or more information objects corresponding to a real world object, one information object from the document and at least one information object from the document storage for a combination of global identification patterns that exist in the document and in the document storage; ascertaining consistency of the matching pairs and determining which of the one or more information objects in the document are suitable for merging into the document storage; and adding the one or more information objects from the document to the document storage to associate information objects corresponding to the real world object.

4 Citations

20 Claims

1. A method comprising:
- identifying matching pairs of one or more information objects corresponding to a real world object, one information object from a document and at least one information object from a document storage for a combination of global identification patterns that exist in the document and in the document storage;
  
  ascertaining consistency of the matching pairs and determining which of the one or more information objects in the document are suitable for merging into the document storage; and
  
  adding the one or more information objects from the document to the document storage to associate information objects corresponding to the real world object.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising:
    - prior to identifying the matching pairs, searching for the global identification patterns and for the combination of the global identification patterns in the document and searching for the global identification patterns and their combinations in the document storage.
  - 3. The method of claim 1, wherein adding the one or more information objects from the document to the document storage further comprises adding one or more features of the one or more information objects in the document to the document storage if the one or more feature is absent from the document storage and if the one or more information objects in the document and in the document storage correspond to the real world object.
  - 4. The method of claim 1, wherein adding the one or more information objects from the document to the document storage further comprises adding one or more information objects from the document to the document storage as new information objects if the one or more information objects in the document storage do not have one or more information objects in the document storage corresponding to the real world object.
  - 5. The method of claim 1, wherein the global identification patterns correspond to features of the real world object.
  - 6. The method of claim 1, wherein the one or more information objects correspond to one or more of a mention, a name, or a reference to the real world object in a natural language.
  - 7. The method of claim 1, wherein ascertaining consistency of the matching pairs further comprises ascertaining consistency of features of the one or more information objects with ontology.
  - 8. The method of claim 7, wherein consistency of features indicates that merging the one or more information objects does not violate cardinality of relations between the one or more information objects.
  - 9. The method of claim 1, further comprising computing weights of each pattern of the combination and generating a unicity parameter and a specialty parameter for the combination.
  - 10. The method of claim 9, further comprising determining reliability of the combination of global identification patterns based on one or more of the weights, unicity parameter, or the specialty parameter.
  - 11. The method of claim 10, wherein determining the reliability further comprises identifying a set of global identification patterns having a sum of the weights of each of the global identification patterns that exceed a predetermined threshold.

12. A system comprising:
- a memory; and
  
  a processor, coupled to the memory, the processor to;
  
  identify matching pairs of one or more information objects corresponding to a real world object, one information object from a document and at least one information object from a document storage for a combination of global identification patterns that exist in the document and in the document storage;
  
  ascertain consistency of the matching pairs and determining which of the one or more information objects in the document are suitable for merging into the document storage; and
  
  add the one or more information objects from the document to the document storage to associate information objects corresponding to the real world object.
- View Dependent Claims (13, 14, 15, 16, 17, 18)
- - 13. The system of claim 12, wherein the processor is further to:
    - prior to identifying the matching pairs, search for the global identification patterns and for the combination of the global identification patterns in the document and search for the global identification patterns and their combinations in the document storage.
  - 14. The system of claim 12, wherein to ascertain consistency of the matching pairs, the processor is further to ascertain consistency of features of the one or more information objects with ontology.
  - 15. The system of claim 14, wherein consistency of features indicates that merging the one or more information objects does not violate cardinality of relations between the one or more information objects.
  - 16. The system of claim 12, wherein the processor is further to:
    - compute weights of each pattern of the combination; and
      
      generate a unicity parameter and a specialty parameter for the combination.
  - 17. The system of claim 16, wherein the processor is further to:
    - determine reliability of the combination of global identification patterns based on one or more of the weights, unicity parameter, or the specialty parameter.
  - 18. The system of claim 17, wherein to determine the reliability, the processor is further to:
    - identify a set of global identification patterns having a sum of the weights of each of the global identification patterns that exceed a predetermined threshold.

19. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a processing device, cause the processing device to:
- identify matching pairs of one or more information objects corresponding to a real world object, one information object from a document and at least one information object from a document storage for a combination of global identification patterns that exist in the document and in the document storage;
  
  ascertain consistency of the matching pairs and determining which of the one or more information objects in the document are suitable for merging into the document storage; and
  
  add the one or more information objects from the document to the document storage to associate information objects corresponding to the real world object.
- View Dependent Claims (20)
- - 20. The computer-readable non-transitory storage medium of claim 19, wherein the processing device is further to:
    - prior to identifying the matching pairs, search for the global identification patterns and for the combination of the global identification patterns in the document and search for the global identification patterns and their combinations in the document storage.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ABBYY Development LLC
Original Assignee
ABBYY Production LLC (ABBYY Software)
Inventors
Sukhodolov, Dmitry, Matskevich, Stepan, Starostin, Anatoly
Primary Examiner(s)
Vo, Truong V

Application Number

US16/005,327
Publication Number

US 20180330157A1
Time in Patent Office

498 Days
Field of Search

707704
US Class Current
CPC Class Codes

G06F 18/22 Matching criteria, e.g. pro...

G06V 30/416 Extracting the logical stru...

System and method for global identification in a collection of documents

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

4 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for global identification in a collection of documents

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

4 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links