Method For Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon
First Claim
1. A method, in an information handling system comprising a processor and a memory, of identifying cluster relationships for searching across a plurality of corpora, the method comprising:
- identifying, by the system, a plurality of different cluster classifications for a corresponding plurality of corpora;
classifying, by the system, entity information from documents stored in the plurality of corpora into the plurality of different cluster classifications;
applying semantic analysis, by the system, to identify entity relationships between entity information classified in the plurality of different cluster classifications;
determining, by the system, one or more scores for each identified entity relationship;
identifying, by the system, a cluster relationship between at least two cluster classifications based on the one or more scores for each identified entity relationship; and
searching, by the information handling system, at least first and second corpora corresponding to the at least two cluster classifications having the identified cluster relationship.
1 Assignment
0 Petitions
Accused Products
Abstract
An approach is provided for identifying entity relationships based on word classifications extracted from business documents stored in a plurality of corpora. In the approach, performed by an information handling system, a plurality of cluster classifications are identified for the business documents so that entity information from the business documents can be classified or assigned to the cluster classifications, such as by performing natural language processing (NLP) analysis of the business documents. The approach applies semantic analysis to identify and score entity relationships between the entity information classified in the cluster classifications, and based on the scored entity relationships, cluster relationships between the cluster classifications are identified.
19 Citations
20 Claims
-
1. A method, in an information handling system comprising a processor and a memory, of identifying cluster relationships for searching across a plurality of corpora, the method comprising:
-
identifying, by the system, a plurality of different cluster classifications for a corresponding plurality of corpora; classifying, by the system, entity information from documents stored in the plurality of corpora into the plurality of different cluster classifications; applying semantic analysis, by the system, to identify entity relationships between entity information classified in the plurality of different cluster classifications; determining, by the system, one or more scores for each identified entity relationship; identifying, by the system, a cluster relationship between at least two cluster classifications based on the one or more scores for each identified entity relationship; and searching, by the information handling system, at least first and second corpora corresponding to the at least two cluster classifications having the identified cluster relationship. - View Dependent Claims (2, 3, 6, 7, 8)
-
-
4-5. -5. (canceled)
-
9. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to identify cluster relationships for searching across a plurality of corpora, wherein the set of instructions perform actions of; identifying, by the system, a plurality of different cluster classifications for corresponding plurality of corpora; classifying, by the system, entity information from documents stored in the plurality of corpora into the plurality of different cluster classifications; applying semantic analysis, by the system, to identify entity relationships between entity information classified in the plurality of different cluster classifications; determining, by the system, one or more scores for each identified entity relationship; identifying, by the system, a cluster relationship between at least two cluster classifications based on the one or more scores for each identified entity relationships; and searching, by the information handling system, at least first and second corpora corresponding to the at least two cluster classifications having the identified cluster relationship. - View Dependent Claims (10, 11, 14, 15, 16)
-
-
12-13. -13. (canceled)
-
17. A computer program product stored in a computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the information handling system to identify cluster relationships for searching across a plurality of corpora by performing actions comprising:
-
identifying, by the system, a plurality of different cluster classifications for a corresponding plurality of corpora; classifying, by the system, entity information from documents stored in the plurality of corpora into the plurality of different cluster classifications; applying semantic analysis, by the system, to identify entity relationships between entity information classified in the plurality of different cluster classifications; determining, by the system, one or more scores for each identified entity relationship; identifying, by the system, a cluster relationship between at least two cluster classifications based on the one or more scores for each identified entity relationships; and searching, by the information handling system, at least first and second corpora corresponding to the at least two cluster classifications having the identified cluster relationship. - View Dependent Claims (20)
-
-
18-19. -19. (canceled)
Specification