Systems and methods for lexicon generation
First Claim
Patent Images
1. A method for lexicon generation, comprising the steps of:
- determining a corpus term from a plurality of documents;
generating a candidate term from the corpus term, wherein generating the candidate term comprises generating a linguistic variant of the corpus term;
generating a plurality of equivalent terms from the candidate term;
validating the plurality of equivalent terms by comparing the plurality of equivalent terms to frequency of occurrence of the candidate term;
linking each of the plurality of equivalent terms to the candidate term to create respective equivalent term pairs;
determining whether any of the equivalent term pairs are equivalent and, in response to determining that at least two of equivalent term pairs are equivalent, merging the equivalent term pairs to create a group of equivalent terms;
selecting a normalized term from the group of equivalent terms; and
storing the group of equivalent terms.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are embodiments for lexicon generation. More specifically, at least one embodiment of a method includes determining a corpus term from a plurality of documents, generating a candidate term from the corpus term, and selecting a normalized term from the candidate term and the corpus term. Some embodiments include linking the normalized term with the candidate term and providing an electronic search capability for locating a first document, where the electronic search capability receives the candidate term as a search term and utilizes the normalized term to locate the first document.
-
Citations
15 Claims
-
1. A method for lexicon generation, comprising the steps of:
-
determining a corpus term from a plurality of documents; generating a candidate term from the corpus term, wherein generating the candidate term comprises generating a linguistic variant of the corpus term; generating a plurality of equivalent terms from the candidate term; validating the plurality of equivalent terms by comparing the plurality of equivalent terms to frequency of occurrence of the candidate term; linking each of the plurality of equivalent terms to the candidate term to create respective equivalent term pairs; determining whether any of the equivalent term pairs are equivalent and, in response to determining that at least two of equivalent term pairs are equivalent, merging the equivalent term pairs to create a group of equivalent terms; selecting a normalized term from the group of equivalent terms; and storing the group of equivalent terms. - View Dependent Claims (2, 3, 4)
-
-
5. A system for lexicon generation, comprising:
-
a processor; and a memory component that stores lexicon generation logic that when executed by the processor, causes a computer to perform at least the following; determine a corpus term from a plurality of documents; generate a candidate term from the corpus term, wherein generating the candidate term comprises generating a linguistic variant of the corpus term; generate a plurality of equivalent terms from the candidate term; validate the plurality of equivalent terms by comparing the plurality of equivalent terms to a frequency of occurrence of the candidate term; link each of the plurality of equivalent terms to the candidate term to create respective equivalent term pairs; determine whether any of the equivalent term pairs are equivalent and, in response to determining that at least two of equivalent term pairs are equivalent, merging the equivalent term pairs to create a group of equivalent terms; select a normalized term from the group of equivalent terms; and store the group of equivalent terms. - View Dependent Claims (6, 7, 8, 9)
-
-
10. A non-transitory computer-readable medium for lexicon generation that stores a program that, when executed by a computer, causes the computer to perform at least the following:
-
determine a corpus term from a plurality of documents; generate a candidate term from the corpus term;
term, wherein generating the candidate term comprises generating a linguistic variant of the corpus term;generate a plurality of equivalent terms from the candidate term; validate the plurality of equivalent terms by comparing the plurality of equivalent terms to a frequency of occurrence of the candidate term; link each of the plurality of equivalent terms to the candidate term to create respective equivalent term pairs; determine whether any of the equivalent term pairs are equivalent and, in response to determining that at least two of equivalent term pairs are equivalent, merging the equivalent term pairs to create a group of equivalent terms; select a normalized term from the group of equivalent terms; and store the group of equivalent terms. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification