Methods for generating or revising context vectors for a plurality of word stems
First Claim
1. A method for generating a dictionary of context vectors comprising:
- providing a corpus of records, each including a series of words wherein each word corresponds to one of a plurality of word stems;
generating a context vector for each of a core group of word stems;
temporarily assigning a zero context vector to the remaining word stems in said plurality of word stems that are not in said core group;
for each word stem with a zero vector, combining context vectors based on proximity in each of said series of words between the word corresponding to said word stem and the words corresponding to said context vectors to generate a context vector for said word stem.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for generating context vectors for use in a document storage and retrieval system. A context vector is a fixed length list of component values generated to approximate conceptual relationships. A context vector is generated for each word stem. The component values may be manually determined on the basis of conceptual relationships to word-based features for a core group of word stems The core group of context vectors are used to generate the remaining context vectors based on the proximity of a word stem to words and the context vectors assigned to those words. The core group may also be generated by initially assigning each core word stem a row vector from an identity matrix and then performing the proximity based algorithm. Context vectors may be revised as new records are added to the system, based on the proximity relationships between word stems in the new records.
-
Citations
17 Claims
-
1. A method for generating a dictionary of context vectors comprising:
-
providing a corpus of records, each including a series of words wherein each word corresponds to one of a plurality of word stems; generating a context vector for each of a core group of word stems; temporarily assigning a zero context vector to the remaining word stems in said plurality of word stems that are not in said core group; for each word stem with a zero vector, combining context vectors based on proximity in each of said series of words between the word corresponding to said word stem and the words corresponding to said context vectors to generate a context vector for said word stem. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for generating a core group of context vectors comprising
providing a corpus of records, each including a series of words wherein each word corresponds to one of a plurality of word stems; -
selecting a core group of word stems from said plurality of word stems; assigning a different vector to each word stem in said core group of word stems;
for each word stem in said core group, combining the different vectors based on proximity in each of said series of words between the word corresponding to said each word stem and the words corresponding to said different vectors to generate a context vector for said each word stem. - View Dependent Claims (12, 13)
-
-
14. A method for revising a dictionary of context vectors having a context vector for each of a first plurality of word stems comprising:
-
providing a database with a plurality of records; providing a counter for each word stem, said counter indicating how many times the word stem appears in said plurality of records; adding a new record to said database; performing the following steps for each of a second plurality of word stems found in said new record; computing a sum vector by combining context vectors based on proximity to said each word stem in said new record of the words corresponding to said context vectors; multiplying the context vector corresponding to said each word stem in said dictionary of context vectors by the counter corresponding to said each word stem to get a product vector; combining the sum vector with said product vector to give a normalized vector; incrementing the counter corresponding to said each word stem; replacing the context vector corresponding to said each word stem with said normalized vector.
-
-
15. A method for generating a dictionary of context vectors comprising:
-
providing a corpus of records, each including a series of words wherein each word corresponds to one of a plurality of word stems; generating a context vector for each of a core group of word stems; temporarily assigning a zero context vector to the remaining word stems in said plurality of word stems that are not in said core group; serially proceeding through the corpus of records and for each word stem that is not in said core group; combining context vectors based on proximity in each of said series of words between the word corresponding to said word stem and the words corresponding to said context vectors to generate a sum vector for said word stem; combining the sum vector with the context vector assigned to said word stem to generate a replacement context vector for said word stem. - View Dependent Claims (16, 17)
-
Specification