Natural language information retrieval system
First Claim
1. A method for creating a group of searchable documents comprising the steps of, for each of a plurality of documents:
- receiving text defining the document;
parsing the text into a plurality of text portions;
obtaining one or more logical form relationships corresponding to each text portion;
defining an array having a size corresponding to the number of logical form relationships for the document;
creating a hash-table fingerprint for the document by, for each logical form relationship, computing a hash value, obtaining an address hash and a signature hash based on the corresponding hash value, and storing the signature hash in the array at a memory location corresponding to the address hash;
parsing each hash value to obtain the corresponding address hash and signature hash;
identifying an array index for an array entry point corresponding to the address hash; and
if the array entry point is empty, storing the signature hash at the array entry point.
2 Assignments
0 Petitions
Accused Products
Abstract
A natural language information retrieval (NLIR) system employing a hash table technique to reduce memory requirements and a proxy process module to improve processing speed on multi-processor platforms. The NLIR system includes a Dynamic Link Library (DLL) search engine annex that implements a number of improvements that allow the preexisting natural language processing (NLP) core code module to operate sufficiently fast in a limited-memory environment. The improvements relate to (1) reducing storage requirements, (2) increasing processing speed, (3) improved operation on multi-processor platforms, and (4) a trouble-shooting mechanism. The NLIR system includes three modes of operation. First, during index processing, the NLIR system prepares documents for NLP searching to create a group of searchable documents. Second, during question processing, the NLIR system receives a natural language question and, for each document in the group of searchable documents, computes a document score connoting the likelihood that the document includes an answer to the natural language question. Third, during debugging, the NLIR system receives trouble-shooting requests and returns diagnostic reports, such as a document trace report and a question trace report.
-
Citations
26 Claims
-
1. A method for creating a group of searchable documents comprising the steps of, for each of a plurality of documents:
-
receiving text defining the document;
parsing the text into a plurality of text portions;
obtaining one or more logical form relationships corresponding to each text portion;
defining an array having a size corresponding to the number of logical form relationships for the document;
creating a hash-table fingerprint for the document by, for each logical form relationship, computing a hash value, obtaining an address hash and a signature hash based on the corresponding hash value, and storing the signature hash in the array at a memory location corresponding to the address hash;
parsing each hash value to obtain the corresponding address hash and signature hash;
identifying an array index for an array entry point corresponding to the address hash; and
if the array entry point is empty, storing the signature hash at the array entry point. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
if the array entry point is not empty, incrementing the array index of the array entry point until an empty memory location is defined; and
storing the signature hash at the empty memory location.
-
-
3. The method of claim 2, wherein the step of identifying the array index for the array entry point comprises the step of setting the array index to the remainder of the address hash divided by the size of the array.
-
4. The method of claim 3, wherein the step of defining an array having a size corresponding to the number of logical form relationships for the document comprises the step of defining an array having a size that is a predetermined percentage larger than the number of logical form relationships for the document.
-
5. The method of claim 4, wherein:
-
the predetermined percentage is 110%;
the hash value is a 32-bit value;
the address hash is the upper 16 bits of the hash value; and
the signature hash is the lower 16 bits of the hash value.
-
-
6. The method of claim 1, further comprising the steps of:
-
receiving a natural language question;
obtaining one or more logical form relationships for the question;
computing a hash value corresponding to each logical form relationship for the question; and
for each document in the group of searchable documents, comparing each hash value corresponding to the logical form relationships for the question to the hash-table fingerprint for the document and identifying one or more matching hash values, obtaining a score for each matching hash value, and computing a document score connoting the likelihood that the document contains an answer to the question by summing the score for each matching hash value.
-
-
7. The method of claim 6, further comprising the steps of, for a current hash value for the question:
-
parsing the current hash value into a current address hash and a current signature hash;
identifying an array entry point in the array corresponding to the current address hash; and
if the array entry point is not empty, identifying one or more consecutively-addressed data-containing memory locations beginning with the array entry point, comparing the current signature hash to the data value stored at each of the consecutively-addressed data-containing memory locations, and if the current signature hash matches the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a matching hash value.
-
-
8. The method of claim 7, further comprising the steps of, for a current hash value for the question:
-
if the array entry point is empty, identifying the current hash value as a non-matching hash value; and
if the current signature hash does not match the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a non-matching hash value.
-
-
9. The method of claim 8, further comprising the steps of:
-
ranking the documents in order of their respective document scores; and
displaying a list of highest-ranking documents as a suggestion list of documents containing an answer to the natural language question.
-
-
10. A computer-readable medium having computer-executable instructions comprising:
-
a natural language information retrieval module configured for;
creating a group of searchable documents by, for each document, receiving text defining the document from a search engine and returning a hash-table fingerprint comprising a representation of logical form relationships for the document to the search engine, and for each document, receiving a natural language question and the hash-table fingerprint comprising the representation of logical form relationship for the document from the search engine and returning a document score to the search engine connoting the likelihood that the document contains an answer to the natural language question;
the search engine configured for;
ranking the documents in order of their respective document scores, and displaying a list of highest-ranking documents as a suggestion list of documents containing an answer to the natural language question; and
wherein the natural language information retrieval module defines an interface comprising;
a first interface method for receiving the text documents from the search engine and returning the hash-table fingerprints to the search engine; and
a second interface method for receiving a current natural language question and a hash-table fingerprint for a current document from the search engine and returning a document score to the search engine connoting the likelihood that the current document contains an answer to the natural language question. - View Dependent Claims (11, 12, 13)
a third interface method for initiating processing of the natural language question; and
a forth interface method for terminating processing of the natural language question.
-
-
12. The computer-readable medium of claim 10, wherein the natural language information retrieval module is configured for parsing each document into a plurality of sentences, further comprising a proxy process module configured for:
-
receiving the sentences from one or more active client threads other than the first active client thread, each active client thread associated with the natural language information retrieval module;
creating a process for each client thread other than the first active client thread; and
passing the sentences for each client thread other than the first active client thread to a natural language processing core code module in the context of an associated process, the natural language processing core code module configured to identify one or more logical form relationships corresponding to each sentence and return the logical form relationships to the natural language information retrieval module.
-
-
13. The computer-readable medium of claim 10, further comprising a debugging module defining an interface comprising:
-
a first interface method for activating and deactivating a trace document function that, when active, causes the natural language information retrieval module to identify the logical form relationships identified for document text processed by the natural language information retrieval module; and
a second interface method for activating and deactivating a trace question function that, when active, causes the natural language information retrieval module to identify the logical form relationships identified for questions processed by the natural language information retrieval module.
-
-
14. A computer-readable medium having computer-executable instructions comprising:
-
a search engine;
a natural language information retrieval module configured for, receiving a text document from the search engine, and parsing the text document into a plurality of sentences;
a proxy process module configured for, receiving the sentences from one or more active client threads associated with the natural language information retrieval module, and creating a process for each client thread other than the first active client thread;
a natural language processing core code module configured for, receiving the sentence from the natural language information retrieval module and from the proxy process module, and for each sentence, identifying one or more logical form relationships corresponding to the sentence and returning the logical form relationships to the natural language information retrieval module; and
wherein the natural language information retrieval module is further configured for;
defining an array having a size corresponding to the number of logical form relationships for the text document;
computing a hash value corresponding to each logical form relationship for the document;
obtaining an address hash and a signature hash for each logical form relationship based on the corresponding hash value; and
for each logical form relationship, storing the signature hash in the array at a memory location corresponding to the address hash to construct a hash-table fingerprint for the text document. - View Dependent Claims (15, 16, 17, 18, 19, 20)
parsing each hash value to obtain the corresponding address hash and signature hash;
identifying an array index for an array entry point corresponding to the address hash; and
if the array entry point is empty, storing the signature hash at the array entry point.
-
-
16. The computer-readable medium of claim 15, wherein the natural language information retrieval module is further configured for identifying the array index for the array entry point by setting the array index to the remainder of the address hash divided by the size of the array.
-
17. The computer-readable medium of claim 16, wherein the natural language information retrieval module is further configured for:
-
determining for each hash value whether the array entry point is empty; and
if the array entry point is not empty, incrementing the array index of the array entry point until an empty memory location is defined, and storing the signature hash at the empty memory location.
-
-
18. The computer-readable medium of claim 17, wherein the natural language information retrieval module is further configured for:
-
receiving a natural language question;
obtaining one or more logical form relationships for the question;
computing a hash value corresponding to each logical form relationship for the question; and
for each text document in a group of searchable documents, comparing the hash values corresponding to the logical form relationship for the question to the hash-table fingerprint for the text document and identifying one or more matching hash values, obtaining a score for each matching hash value, and computing a document score connoting the likelihood that the document contains an answer to the natural language question by summing the score for each matching hash value.
-
-
19. The computer-readable medium of claim 18, wherein the natural language information retrieval module is further configured for:
-
parsing a current hash value into a current address hash and a current signature hash;
identifying an array entry point in the array corresponding to the current address hash; and
if the array entry point is not empty, identifying one or more consecutively-addressed data-containing memory locations beginning with the array entry point, comparing the current signature hash to the data value stored at each of the consecutively-addressed data-containing memory locations, and if the current signature hash matches the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a matching hash value.
-
-
20. The computer-readable medium of claim 19, wherein the natural language information retrieval module is further configured for:
-
if the array entry point is empty, identifying the current hash value as a non-matching hash value; and
if the current signature hash does not match the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a non-matching hash value.
-
-
21. A method for creating a group of searchable documents comprising the steps of, for each of a plurality of documents:
-
receiving text defining the document;
parsing the text into a plurality of text portions;
obtaining one or more logical form relationships corresponding to each text portion;
defining an array having a size corresponding to the number of logical form relationships for the documents; and
creating a hash-table fingerprint for the document by, for each logical form relationship,computing a hash value, obtaining an address hash and a signature hash based on the corresponding hash value, and storing the signature hash in the array at a memory location corresponding to the address hash;
receiving a natural language question;
obtaining one or more logical form relationships for the question;
computing a hash value corresponding to each logical form relationship for the question; and
for each document in the group of searchable documents, comparing each hash value corresponding to the logical form relationships for the question to the hash-table fingerprint for the document and identifying one or more matching hash values, obtaining a score for each matching hash value, and computing a document score connoting the likelihood that the document contains an answer to the question by summing the score for each matching hash value. - View Dependent Claims (22, 23, 24)
parsing the current hash value into a current address hash and a current signature hash;
identifying an array entry point in the array corresponding to the current address hash; and
if the array entry point is not empty, it identifying one or more consecutively-addressed data-containing memory locations beginning with the array entry point, comparing the current signature hash to the data value stored at each of the consecutively-addressed data-containing memory locations, and if the current signature hash matches the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a matching hash value.
-
-
23. The method of claim 22, further comprising the steps of, for a current hash value for the question:
-
if the array entry point is empty, identifying the current hash value as a non-matching hash value; and
if the current signature hash does not match the data value stored at any of the consecutively-addressed data-containing memory locations, identifying the current hash value as a non-matching hash value.
-
-
24. The method of claim 23, further comprising the steps of:
-
ranking the documents in order of their respective documents scores; and
displaying a list of highest-ranking documents as a suggestion list of documents containing an answer to the natural language question.
-
-
25. A computer-readable medium having computer-executable instructions comprising:
-
a natural language information retrieval module configured for;
creating a group of searchable documents by, for each document, receiving text defining the document from a search engine and returning a hash-table fingerprint comprising a representation of logical form relationships for the document to the search engine, and for each document, receiving a natural language question and the hash-table fingerprint comprising the representation of logical form relationships for the document from the search engine and returning a document score to the search engine connoting the likelihood that the document contains an answer to the natural language question;
the search engine configured for;
ranking the documents in order of their respective document scores; and
displaying a list of highest-ranking documents as a suggestion list of documents containing an answer to the natural language question; and
wherein the natural language information retrieval module is further configured for parsing each document into a plurality of sentences and a proxy process module is configured for;
receiving the sentences from one or more active client threads other than the first active client thread, each active client thread associated with the natural language information retrieval module;
creating a process for each client thread other than the first active client thread; and
passing the sentences for each client thread other than the first active client thread to a natural language processing core code module in the context of an associated process, the natural language processing core code module configured to identify one or more logical form relationships corresponding to each sentence and return the logical form relationships to the natural language information retrieval module.
-
-
26. A computer-readable medium having computer-executable instructions comprising:
-
a natural language information retrieval module configured for;
creating a group of searchable documents by, for each document, receiving text defining the document form a search engine and returning a hash-table fingerprint comprising a representation of logical form relationships for the document to the search engine, and for each document, receiving a natural language question and the hash-table fingerprint comprising the representation of logical form relationships for the document from the search engine and returning a document score to the search engine connoting the likelihood that the document contains an answer to the natural language question;
the search engine configured for;
ranking the documents in order of their respective document scores, and displaying a list of highest-ranking documents as a suggestion list of documents containing an answer to the natural language question; and
a debugging module defining an interface comprising;
a first interface method for activating and deactivating a trace document function that, when active, causes the natural language information retrieval module to identify the logical form relationships identified for document text processed by the natural language information retrieval module; and
a second interface method for activating and deactivating a trace question function that, when active, causes the natural language information retrieval module to identify the logical form relationships identified for questions processed by the natural language information retrieval module.
-
Specification