Natural language query processing
First Claim
1. A computer-based method of transforming a natural language query into a representation of the natural language query wherein the representation is usable for purposes of an input into a search system that extracts answers based on the input, the computer-based method comprising:
- utilizing a computer processor for comparing the natural language query against common terms of a core knowledge pack to identifying semantic units in the natural language query such that each semantic unit is a respective portion of the natural language query;
utilizing the computer processor for associating a token with each uniquely identified semantic unit by recognizing the respective uniquely identified semantic unit in a dictionary having the token associated with the uniquely identified semantic unit;
utilizing the computer processor for identifying a stem for at least a first one of the tokens as part of a token processing operation, the stem being identified by replacing the first token with a stem corresponding to the token in the dictionary wherein the token associated with the stem is also associated with a plurality of semantic units in the dictionary;
utilizing the computer processor for identifying a lexical phrase for at least a second one of the tokens as part of the token processing operation, wherein the lexical phrase is obtained by combining one of the uniquely identified semantic units with the second token; and
utilizing the computer processor for representing the query as an ordered combination of the identified stems and lexical phrases identified in the token processing operation.
0 Assignments
0 Petitions
Accused Products
Abstract
An enhanced natural language information retrieval technique tokenizes an incoming query, comparing the tokenized representation against a collection of query templates. Query templates include a first portion having one or more query patterns representative of a query and in a form suitable for matching the tokenized representation of an incoming query. Query templates also include one or more information retrieval commands that are designed to return information relevant to those query patterns in its first portion. The enhanced natural language information retrieval technique selects those query templates that are determined to be most relevant to the incoming query (via its tokenized representation) and initiates one or more information retrieval commands associated with the selected query templates.
-
Citations
10 Claims
-
1. A computer-based method of transforming a natural language query into a representation of the natural language query wherein the representation is usable for purposes of an input into a search system that extracts answers based on the input, the computer-based method comprising:
-
utilizing a computer processor for comparing the natural language query against common terms of a core knowledge pack to identifying semantic units in the natural language query such that each semantic unit is a respective portion of the natural language query; utilizing the computer processor for associating a token with each uniquely identified semantic unit by recognizing the respective uniquely identified semantic unit in a dictionary having the token associated with the uniquely identified semantic unit; utilizing the computer processor for identifying a stem for at least a first one of the tokens as part of a token processing operation, the stem being identified by replacing the first token with a stem corresponding to the token in the dictionary wherein the token associated with the stem is also associated with a plurality of semantic units in the dictionary; utilizing the computer processor for identifying a lexical phrase for at least a second one of the tokens as part of the token processing operation, wherein the lexical phrase is obtained by combining one of the uniquely identified semantic units with the second token; and utilizing the computer processor for representing the query as an ordered combination of the identified stems and lexical phrases identified in the token processing operation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A manufacture comprising a computer-readable medium and a set of instructions on the computer-readable medium, the instructions being executable by a computer processor to execute a computer-based method of:
-
utilizing a computer processor for comparing the natural language query against common terms of a core knowledge pack to identifying semantic units in the natural language query such that each semantic unit is a respective portion of the natural language query; utilizing the computer processor for associating a token with each uniquely identified semantic unit by recognizing the respective uniquely identified semantic unit in a dictionary having the token associated with the uniquely identified semantic unit; utilizing the computer processor for identifying a stem for at least a first one of the tokens as part of a token processing operation, the stem being identified by replacing the first token with a stem corresponding to the token in the dictionary wherein the token associated with the stem is also associated with a plurality of semantic units in the dictionary; utilizing the computer processor for identifying a lexical phrase for at least a second one of the tokens as part of the token processing operation, wherein the lexical phrase is obtained by combining one of the uniquely identified semantic units with the second token; and utilizing the computer processor for representing the query as an ordered combination of the identified stems and lexical phrases identified in the token processing operation.
-
Specification