AUTOMATED COLLECTIVE TERM AND PHRASE INDEX
First Claim
1. A method comprising:
- selecting, by a computing system, a knowledge element from a knowledge corpus of an enterprise for extraction of n-grams;
deriving, by a computing system, a term vector comprising terms in the knowledge element;
identifying, by the computing system, key terms in the term vector based at least on a frequency of occurrence of each term in the knowledge element;
extracting, by the computing system, n-grams using the identified key terms;
scoring, by the computing system, each of the extracted n-grams as a function of at least a frequency of occurrence of each of the n-grams across the knowledge corpus of the enterprise; and
adding, by the computing system, one or more of the extracted n-grams to an index based on the scoring.
4 Assignments
0 Petitions
Accused Products
Abstract
Knowledge automation techniques may include selecting a knowledge element from a knowledge corpus of an enterprise for extraction of n-grams, and deriving a term vector comprising terms in the knowledge element. Based at least on a frequency of occurrence of each term in the knowledge element, key terms are identified in the term vector. Thereafter, the identified key terms are used to extract one or more n-grams from the knowledge element. Each of the extracted n-grams is scored as a function of at least a frequency of occurrence of each of the n-grams across the knowledge corpus of the enterprise, and based on the scoring, one or more of the n-grams is added to a collective term and phrase index.
-
Citations
20 Claims
-
1. A method comprising:
-
selecting, by a computing system, a knowledge element from a knowledge corpus of an enterprise for extraction of n-grams; deriving, by a computing system, a term vector comprising terms in the knowledge element; identifying, by the computing system, key terms in the term vector based at least on a frequency of occurrence of each term in the knowledge element; extracting, by the computing system, n-grams using the identified key terms; scoring, by the computing system, each of the extracted n-grams as a function of at least a frequency of occurrence of each of the n-grams across the knowledge corpus of the enterprise; and adding, by the computing system, one or more of the extracted n-grams to an index based on the scoring. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable storage memory storing a plurality of instructions executable by one or more processors, the plurality of instructions comprising:
-
instructions that cause the one or more processors to select a knowledge element from a knowledge corpus of an enterprise for extraction of n-grams; instructions that cause the one or more processors to identify key terms in a term vector associated with the knowledge element based at least on a frequency of occurrence of each term in the knowledge element; instructions that cause the one or more processors to calculate a probability of one or more terms adjacent to each key term in the knowledge element as preceding or following the key term based on a function of natural language processing; instructions that cause the one or more processors to extract an n-gram comprising the one or more terms and the key term when the probability of the one or more terms being adjacent to the key term is greater than a minimum threshold probability; instructions that cause the one or more processors to extract an n-gram comprising only the key term when the probability of the one or more terms being adjacent to the key term is less than the minimum threshold probability; instructions that cause the one or more processors to score each of the extracted n-grams as a function of at least a frequency of occurrence of each of the n-grams across the knowledge corpus of the enterprise; and instructions that cause the one or more processors to add one or more of the extracted n-grams to an index based on the scoring. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
one or more processors; and a memory coupled with and readable by the one or more processors, the memory configured to store a set of instructions which, when executed by the one or more processors, causes the one or more processors to; select a knowledge element from a knowledge corpus of an enterprise for extraction of n-grams; derive a term vector comprising terms in the knowledge element; identify key terms in the term vector based at least on a frequency of occurrence of each term in the knowledge element; extract n-grams using the identified key terms; score each of the extracted n-grams as a function of at least a frequency of occurrence of each of the n-grams across the knowledge corpus of the enterprise; and add one or more of the extracted n-grams to an index based on the scoring. - View Dependent Claims (19, 20)
-
Specification