System, method and program product for answering questions using a search engine
First Claim
1. A system for searching free form text comprising:
- a computer with one or more memories and one or more central processing units (CPUs), one or more of the memories having one or more documents, the documents containing a plurality of words in free form text, the free form text having a natural language structure;
a pattern data structure having a plurality of pattern records, each pattern record containing a question template, an associated question word pattern, and an associated set of QA-Tokens;
a query process that receives one or more queries as input and matches one or more of the queries to one or more of the question templates to determine one or more template matches, the query process further replacing the associated question word pattern in the matching query with the associated set of QA-Tokens, being processed query QA-Tokens, the query process creating a processed query having the QA-Tokens and one or more processed query words being the words of the queries that were not replaced;
a text index data structure having a plurality of index records, each index record having one or more index words with one or more index word location in one or more of the documents and further having one or more index records with one or more index QA-Tokens with one or more index QA-Token locations in one or more of the documents, the index QA-Tokens being an abstraction of one or more of the words; and
a searching process that matches one or more of the process query words with one or more of the index words and one or more of the processed query QA-Tokens with one or more of the index QA-Tokens, the index words and QA-Tokens being features, the searching process further scoring one or more windows by sliding the window over one or more sentences of one or more of the documents, the score of the window being dependent on the number of matching locations in the window.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a system, method, and program product that comprises a computer with a collection of documents to be searched. The documents contain free form (natural language) text. We define a set of labels called QA-Tokens, which function as abstractions of phrases or question-types. We define a pattern file, which consists of a number of pattern records, each of which has a question template, an associated question word pattern, and an associated set of QA-Tokens. We describe a query-analysis process which receives a query as input and matches it to one or more of the question templates, where a priority algorithm determines which match is used if there is more than one. The query-analysis process then replaces the associated question word pattern in the matching query with the associated set of QA-Tokens, and possibly some other words. This results in a processed query having some combination of original query tokens, new tokens from the pattern file, and QA-Tokens, possibly with weights. We describe a pattern-matching process that identifies patterns of text in the document collection and augments the location with corresponding QA-Tokens. We define a text index data structure which is an inverted list of the locations of all of the words in the document collection, together with the locations of all of the augmented QA-Tokens. A search process then matches the processed query against a window of a user-selected number of sentences that is slid across the document texts. A hit-list of top-scoring windows is returned to the user.
-
Citations
13 Claims
-
1. A system for searching free form text comprising:
-
a computer with one or more memories and one or more central processing units (CPUs), one or more of the memories having one or more documents, the documents containing a plurality of words in free form text, the free form text having a natural language structure;
a pattern data structure having a plurality of pattern records, each pattern record containing a question template, an associated question word pattern, and an associated set of QA-Tokens;
a query process that receives one or more queries as input and matches one or more of the queries to one or more of the question templates to determine one or more template matches, the query process further replacing the associated question word pattern in the matching query with the associated set of QA-Tokens, being processed query QA-Tokens, the query process creating a processed query having the QA-Tokens and one or more processed query words being the words of the queries that were not replaced;
a text index data structure having a plurality of index records, each index record having one or more index words with one or more index word location in one or more of the documents and further having one or more index records with one or more index QA-Tokens with one or more index QA-Token locations in one or more of the documents, the index QA-Tokens being an abstraction of one or more of the words; and
a searching process that matches one or more of the process query words with one or more of the index words and one or more of the processed query QA-Tokens with one or more of the index QA-Tokens, the index words and QA-Tokens being features, the searching process further scoring one or more windows by sliding the window over one or more sentences of one or more of the documents, the score of the window being dependent on the number of matching locations in the window. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer executed method for searching a plurality of words in one or more documents in free form text, the free form text having a natural language structure, the method comprising the steps of:
-
receiving one or more queries as input;
matching one or more of the queries to one or more question templates of a pattern record, the pattern record further containing an associated question word pattern and an associated set of QA-Tokens, the matching determining one or more template matches, the query process further replacing the associated question word pattern in the matching query with the associated set of QA-Tokens, being processed query QA-Tokens, the query process creating a processed query having the QA-Tokens and one or more processed query words being the words of the queries that were not replaced; and
a searching process that matches one or more of the process query words with one or more index words in an index record and one or more of the processed query QA-Tokens with one or more of the index QA-Tokens, in the index record, the index words and QA-Tokens being features, the searching process further scoring one or more windows by a sliding the window over the one or more sentences of one or more of the documents, the score of the window being dependent on the number of matching locations in the window.
-
-
12. A computer system for searching a plurality of words in one or more documents in free form text, the free form text having a natural language structure, the system comprising:
-
means for receiving one or more queries as input;
means for matching one or more of the queries to one or more question templates of a pattern record, the pattern record further containing an associated question word pattern and an associated set of QA-Tokens, the matching determining one or more template matches, the query process further replacing the associated question word pattern in the matching query with the associated set of QA-Tokens, being processed query QA-Tokens, the query process creating a processed query having the QA-Tokens and one or more processed query words being the words of the queries that were not replaced; and
means for matching one or more of the process query words with one or more index words in an index record and one or more of the processed query QA-Tokens with one or more of the index QA-Tokens, in the index record, the index words and QA-Tokens being features, the searching process further scoring one or more windows by a sliding the window over the one or more sentences of one or more of the documents, the score of the window being dependent on the number of matching locations in the window.
-
-
13. A computer program product that performs the steps of:
-
matching one or more queries to one or more question templates of a pattern record, the pattern record further containing an associated question word pattern and an associated set of QA-Tokens, the matching determining one or more template matches, the query process further replacing the associated question word pattern in the matching query with the associated set of QA-Tokens, being processed query QA-Tokens, the query process creating a processed query having the QA-Tokens and one or more processed query words being the words of the queries that were not replaced; and
a searching process that matches one or more of the process query words with one or more index words in an index record and one or more of the processed query QA-Tokens with one or more of the index QA-Tokens, in the index record, the index words and QA-Tokens being features, the searching process further scoring one or more windows by a sliding the window over the one or more sentences of one or more of the documents, the score of the window being dependent on the number of matching locations in the window.
-
Specification