Methods and apparatus for information indexing and retrieval as well as query expansion using morpho-syntactic analysis
First Claim
1. An index generator for generation of an index for information retrieval for a corpus, comprising:
- an inflectional analyzer for receiving a corpus as an input, the inflectional analyzer producing a lemmatized corpus having an identified base form and associated inflection for each word of the corpus;
a disambiguator for receiving the lemmatized corpus as an input, the disambiguator applying syntactic knowledge to disambiguate identified multiple inflected base forms in the lemmatized corpus representing the same word in the original corpus to produce a disambiguated corpus;
a derivational generator for receiving the disambiguated corpus as an input and produce an expanded corpus including all possible derivations for each word in the disambiguated corpus; and
a transformational analyzer for receiving the expanded corpus as an input and applying a grammar and a metagrammar to the expanded corpus to conflate term variants in the expanded corpus, the transformational analyzer producing an index to the corpus, the index having a minimum number of variants.
7 Assignments
0 Petitions
Accused Products
Abstract
An index generator and query expander for use in information retrieval in a corpus. A corpus is provided as an input to an inflectional analyzer, which produces a lemmatized corpus having base forms and associated inflections for each word in the original corpus. The lemmatized corpus is provided as an input to a disambiguator, which performs part of speech tagging and morpho-syntactic disambiguation to produce a disambiguated corpus. The disambiguated corpus is provided as an input to a derivational generator, which produces an expanded corpus having all possible valid derivatives of each word of the disambiguated corpus. The disambiguated corpus is provided as an input to a transformational analyzer, using a grammar and a metagrammar for analyzing syntactic and morphosyntactic variations to conflate and generate variants, producing an index to the corpus having a minimum of variants. Alternatively, a query expander is provided utilizing similar techniques.
173 Citations
38 Claims
-
1. An index generator for generation of an index for information retrieval for a corpus, comprising:
-
an inflectional analyzer for receiving a corpus as an input, the inflectional analyzer producing a lemmatized corpus having an identified base form and associated inflection for each word of the corpus; a disambiguator for receiving the lemmatized corpus as an input, the disambiguator applying syntactic knowledge to disambiguate identified multiple inflected base forms in the lemmatized corpus representing the same word in the original corpus to produce a disambiguated corpus; a derivational generator for receiving the disambiguated corpus as an input and produce an expanded corpus including all possible derivations for each word in the disambiguated corpus; and a transformational analyzer for receiving the expanded corpus as an input and applying a grammar and a metagrammar to the expanded corpus to conflate term variants in the expanded corpus, the transformational analyzer producing an index to the corpus, the index having a minimum number of variants. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for generating an index for information retrieval from a corpus, comprising the steps of:
-
performing inflectional analysis on the corpus to identify all associated combinations of base form and inflection for each word of the corpus to produce a lemmatized corpus showing relationships between each word of the corpus and associated combinations of base forms and inflections; performing disambiguation on the lemmatized corpus applying syntactic knowledge to disambiguate identified multiple inflected base forms in the lemmatized corpus representing the same word in the original corpus to produce a disambiguated corpus; performing derivational generation on the disambiguated corpus to produce an expanded corpus containing all derivatives which can be produced from each combination of base form and inflection in the lemmatized corpus; and performing transformational analysis on the expanded corpus using a grammar and a metagrammar to extract variants of terms in the expanded corpus, producing an index with a minimum of term variants. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A query expander for expansion of a query for information retrieval from a corpus, comprising:
-
an inflectional analyzer for receiving a query as an input, the inflectional analyzer producing an inflected query having an identified base form and associated inflection for each word of the query; a disambiguator for receiving the inflected query as an input, the disambiguator applying syntactic knowledge to disambiguate identified multiple inflected base forms in the inflected query representing the same word in the original query to produce a disambiguated query; and a derivational generator for receiving the disambiguated query as an input and producing an expanded query including all possible derivations for each word in the disambiguated query. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A method for expanding a query for use in information retrieval from a corpus, comprising the steps of:
-
performing inflectional analysis on the query to identify all associated combinations of base form and inflection for each word of the query to produce an inflected query showing relationships between each word of the query and associated combinations of base forms and inflections; performing disambiguation on the inflected query applying syntactic knowledge to disambiguate identified multiple inflected base forms in the inflected query representing the same word in the original query to produce a disambiguated query; and performing derivational generation on the disambiguated query to produce an expanded query containing all derivatives which can be produced from each combination of base form and inflection in the inflected query. - View Dependent Claims (35, 36, 37, 38)
-
Specification