Information search system, method and program
First Claim
1. An information search system for searching through a database having a plurality of document data each having a unique document ID added thereto, by use of a computer, the system comprising:
- a storage device for storing the plurality of document data;
index storage means for storing in the storage device occurrence information for each word in each of the plurality of document data when each of the document data is parsed and is expressed in a form of a parse tree with a root node for bundling a plurality of sentences, the occurrence information including a document ID of the document data containing the word, a first order that indicates a sequence number of the word originating from a root node in a structural tree, and a second order that indicates a reverse sequence number of the word originating from a terminal node to the root node in the structural tree, wherein the first order sequence number decreases in value as position proximity of the word to the root node increases, and wherein the second order reverse sequence number decreases in value as position proximity of the word to the terminal node increases;
receiving means for receiving information on at least two words to be searched for;
reading means for reading from the index storage means the occurrence information on each of the words received; and
searching means for comparing occurrence information on a first word among the words received with occurrence information on a second word among the words received and for searching out a document ID of one of the two occurrence information which has the same document ID as the other occurrence information, the first order smaller than the other occurrence information, and the second order larger than the other occurrence information.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer program product for searching at high speed for documents matching a dependency pattern from document data containing a large volume of text documents. The system includes a storage device for storing, index storage means for storing in the storage device occurrence information, receiving means for receiving information, reading means for reading from the index storage means, and searching means for comparing occurrence information. The method and computer program product include the steps of storing in the storage device, receiving information, reading from the storage device, comparing occurrence information, and searching. The computer program product includes instructions to execute the steps of storing each of the plurality of document data in the storage device, storing in the storage device occurrence information.
-
Citations
15 Claims
-
1. An information search system for searching through a database having a plurality of document data each having a unique document ID added thereto, by use of a computer, the system comprising:
-
a storage device for storing the plurality of document data; index storage means for storing in the storage device occurrence information for each word in each of the plurality of document data when each of the document data is parsed and is expressed in a form of a parse tree with a root node for bundling a plurality of sentences, the occurrence information including a document ID of the document data containing the word, a first order that indicates a sequence number of the word originating from a root node in a structural tree, and a second order that indicates a reverse sequence number of the word originating from a terminal node to the root node in the structural tree, wherein the first order sequence number decreases in value as position proximity of the word to the root node increases, and wherein the second order reverse sequence number decreases in value as position proximity of the word to the terminal node increases; receiving means for receiving information on at least two words to be searched for; reading means for reading from the index storage means the occurrence information on each of the words received; and searching means for comparing occurrence information on a first word among the words received with occurrence information on a second word among the words received and for searching out a document ID of one of the two occurrence information which has the same document ID as the other occurrence information, the first order smaller than the other occurrence information, and the second order larger than the other occurrence information. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An information search method for searching through a database having a plurality of document data each having a unique document ID added thereto, by use of a computer having a storage device, the method comprising the steps of:
-
storing each of the plurality of document data in the storage device in a form of a structural tree starting from a root node by parsing; storing in the storage device occurrence information for each word in each of the plurality of document data when each of the document data is a parse tree with the root node for bundling a plurality of sentences, the occurrence information containing a document ID of the document data including the word, a first order that indicates a sequence number of the word originating from a root node in a structural tree, and a second order that indicates a reverse sequence number of the word originating from a terminal node to the root node in the structural tree, wherein the first order sequence number decreases in value as position proximity of the word to the root node increases, and wherein the second order reverse sequence number decreases in value as position proximity of the word to the terminal node increases; receiving information on at least two words to be searched for; reading from the storage device the occurrence information on each of the words received; comparing occurrence information on a first word among the received words with occurrence information on a second word among the received words; and searching out a document ID of one of the above two kinds of occurrence information which has the same document ID as the other occurrence information, the first order smaller than the other occurrence information, and the second order larger than the other occurrence information. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer readable storage medium tangibly embodying a computer readable program code having computer readable non-transitory instructions which, when implemented, cause a computer to carry out the steps of a method for searching through a database having a plurality of document data each having a unique document ID, the method comprising:
-
storing each of the plurality of document data in a form of a structural tree starting from a root node by parsing; storing occurrence information for each word in each of the plurality of document data when each of the document data is a parse tree with the root node for bundling a plurality of sentences, the occurrence information containing a document ID of the document data including the word, a first order that indicates a sequence number of the word originating from a root node in a structural tree, and a second order that indicates a reverse sequence number of the word originating from a terminal node to the root node in the structural tree, wherein the first order sequence number decreases in value as position proximity of the word to the root node increases, and wherein the second order reverse sequence number decreases in value as position proximity of the word to the terminal node increases; receiving information on at least two words to be searched for; reading the occurrence information on each of the words received; comparing occurrence information on a first word among the received words with occurrence information on a second word among the received words; and searching out a document ID of one of the above two kinds of occurrence information which has the same document ID as the other occurrence information, the first order smaller than the other occurrence information, and the second order larger than the other occurrence information. - View Dependent Claims (12, 13, 14, 15)
-
Specification