LINGUISTICALLY-ADAPTED STRUCTURAL QUERY ANNOTATION
First Claim
1. A method for processing queries, comprising:
- providing access to a lexicon in which a set of text elements that each start with a lowercase letter are each recognized in the lexicon as being a proper noun when in a capitalized form;
receiving a natural language query to be processed, the query comprising a sequence of text elements, the text elements comprising words;
with a computer processor, processing the query comprising;
assigning part of speech features to the text elements in the query, including;
for a text element in the query which starts with a lowercase letter and which is among the set of text elements in the lexicon that are recognized as being a proper noun when in a capitalized form, assigning recapitalization information to the query text element, the recapitalization information comprising a part of speech feature of the capitalized form;
disambiguating parts of speech for the text elements in the query including applying rules for recapitalizing text elements based on the recapitalization information; and
chunking the disambiguated query; and
outputting the processed query.
7 Assignments
0 Petitions
Accused Products
Abstract
A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.
-
Citations
22 Claims
-
1. A method for processing queries, comprising:
-
providing access to a lexicon in which a set of text elements that each start with a lowercase letter are each recognized in the lexicon as being a proper noun when in a capitalized form; receiving a natural language query to be processed, the query comprising a sequence of text elements, the text elements comprising words; with a computer processor, processing the query comprising; assigning part of speech features to the text elements in the query, including; for a text element in the query which starts with a lowercase letter and which is among the set of text elements in the lexicon that are recognized as being a proper noun when in a capitalized form, assigning recapitalization information to the query text element, the recapitalization information comprising a part of speech feature of the capitalized form; disambiguating parts of speech for the text elements in the query including applying rules for recapitalizing text elements based on the recapitalization information; and chunking the disambiguated query; and outputting the processed query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for processing queries, comprising:
-
a lexicon stores a set of text elements that start with a lowercase letter that are indexed as being proper nouns when in a capitalized form in which the first letter is capitalized; memory which receives a natural language query to be processed, the query comprising a sequence of text elements, the text elements comprising words; a linguistic analysis component for processing the query, comprising; a preprocessing component for preprocessing the query, the preprocessing component assigning part of speech features to the text elements in the query, the assigning including assigning recapitalization information to the text elements which are in the set of text elements, the recapitalization information comprising proper noun information of the capitalized form of the text element; a part of speech disambiguation component for disambiguating parts of speech for the text elements in the query, the part of speech disambiguation component applying rules for recapitalizing text elements based on the recapitalization information; and a chunking component for chunking the disambiguated query; and a processing component for implementing the linguistic analysis component. - View Dependent Claims (19, 20, 21)
-
-
22. A method for processing queries, comprising:
-
generating a lexicon comprising a set of text elements that start with a lowercase letter and which are recognized as being proper nouns when in a capitalized form of the text element; receiving a natural language query to be processed, the query comprising a sequence of text elements, the text elements comprising words; with a computer processor, processing the query comprising; assigning part of speech features to the text elements in the query, the assigning including; for a text element in the query which is among the set of text elements in the lexicon which are recognized as being proper nouns when in a capitalized form, assigning recapitalization information to the text element, the recapitalization information comprising a part of speech feature of the capitalized form of the text element; disambiguating any ambiguous parts of speech for the text elements in the query, comprising applying rules for recapitalizing text elements based on the recapitalization information; and outputting the processed query.
-
Specification