Natural language information retrieval system
First Claim
1. A method for creating a group of searchable documents comprising the steps of, for each of a plurality of documents:
- receiving text defining the document;
parsing the text into a plurality of text portions;
obtaining one or more logical form relationships corresponding to each text portion;
defining an array having a size corresponding to the number of logical form relationships for the document; and
creating a hash-table fingerprint for the document by, for each logical form relationship, computing a hash value, obtaining an address hash and a signature hash based on the corresponding hash value, and storing the signature hash in the array at a memory location corresponding to the address hash.
2 Assignments
0 Petitions
Accused Products
Abstract
A natural language information retrieval (NLIR) system employing a hash table technique to reduce memory requirements and a proxy process module to improve processing speed on multi-processor platforms. The NLIR system includes a Dynamic Link Library (DLL) search engine annex that implements a number of improvements that allow the preexisting natural language processing (NLP) core code module to operate sufficiently fast in a limited-memory environment. The improvements relate to (1) reducing storage requirements, (2) increasing processing speed, (3) improved operation on multi-processor platforms, and (4) a trouble-shooting mechanism. The NLIR system includes three modes of operation. First, during index processing, the NLIR system prepares documents for NLP searching to create a group of searchable documents. Second, during question processing, the NLIR system receives a natural language question and, for each document in the group of searchable documents, computes a document score connoting the likelihood that the document includes an answer to the natural language question. Third, during debugging, the NLIR system receives trouble-shooting requests and returns diagnostic reports, such as a document trace report and a question trace report.
131 Citations
7 Claims
-
1. A method for creating a group of searchable documents comprising the steps of, for each of a plurality of documents:
-
receiving text defining the document;
parsing the text into a plurality of text portions;
obtaining one or more logical form relationships corresponding to each text portion;
defining an array having a size corresponding to the number of logical form relationships for the document; and
creating a hash-table fingerprint for the document by, for each logical form relationship, computing a hash value, obtaining an address hash and a signature hash based on the corresponding hash value, and storing the signature hash in the array at a memory location corresponding to the address hash. - View Dependent Claims (2, 3, 4, 5)
receiving a natural language question;
obtaining one or more logical form relationships for the question;
computing a hash value corresponding to each logical form relationship for the question; and
for each document in the group of searchable documents, comparing each hash value corresponding to the logical form relationships for the question to the hash-table fingerprint for the document and identifying one or more matching hash values, obtaining a score for each matching hash value, and computing a document score connoting the likelihood that the document contains an answer to the question by summing the score for each matching hash value.
-
-
3. The method of claim 2, further comprising the steps of, for a current hash value for the question:
-
parsing the current hash value into a current address hash and a current signature hash;
identifying an array entry point in the array corresponding to the current address hash; and
if the array entry point is not empty, identifying one or more consecutively-addressed data-containing memory locations beginning with the array entry point, comparing the current signature hash to the data value stored at each of the consecutively-addressed data-containing memory locations, and if the current signature hash matches the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a matching hash value.
-
-
4. The method of claim 3, further comprising the steps of, for a current hash value for the question:
-
if the array entry point is empty, identifying the current hash value as a non-matching hash value; and
if the current signature hash does not match the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a non-matching hash value.
-
-
5. The method of claim 4, further comprising the steps of:
-
ranking the documents in order of their respective document scores; and
displaying a list of highest-ranking documents as a suggestion list of documents containing an answer to the natural language question.
-
-
6. A computer-readable medium having computer-executable instructions comprising:
-
a natural language information retrieval module configured for;
creating a group of searchable documents by, for each document, receiving text defining the document from a search engine and returning a hash-table fingerprint comprising a representation of logical form relationships for the document to the search engine, and for each document, receiving a natural language question and the hash-table fingerprint comprising the representation of logical form relationships for the document from the search engine and returning a document score to the search engine connoting the likelihood that the document contains an answer to the natural language question; and
the search engine configured for;
ranking the documents in order of their respective document scores, and displaying a list of highest ranking documents as a suggestions list of documents containing an answer to the natural language question; and
wherein the natural language information retrieval module is configured for parsing each document into a plurality of sentences, further comprising a proxy process module configure for;
receiving the sentences from one or more active client threads other than the first active client thread, each active client thread associated with the natural language information retrieval module;
creating a process for each client thread other than the first active client thread; and
passing the sentences for each client thread other than the first active client thread to a natural language processing core code module in the context of an associated process, the natural language processing core code module configured to identify one or more logical form relationships corresponding to each sentence and return the logical form relationships to the natural language information retrieval module.
-
-
7. A computer-readable medium having computer-executable instructions comprising:
-
a natural language information retrieval module configured for;
creating a group of searchable documents by, for each document, receiving test defining the document from a search engine and returning a hash-table fingerprint comprising a representation of logical form relationships for the document to the search engine, and for each document, receiving a natural language question and the hash-table fingerprint comprising the representation of logical form relationships for the document from the search engine and returning a document score to the search engine connoting the likelihood that the document contains an answer to the natural language question; and
the search engine configured for;
ranking the documents in order of their respective document scores, and displaying a list of highest ranking documents as a suggestions list of documents containing an answer to the natural language question; and
further comprising a debugging module defining an interface comprising;
a first interface method for activating and deactivating a trace document function that, when active, causes the natural language information retrieval module to identify the logical form relationships identified for document text processed by the natural language information retrieval module; and
a second interface method for activating and deactivating a trace question function that, when active, causes the natural language information retrieval module to identify the logical form relationships identified for questions processed by the natural language information retrieval module.
-
Specification