Device and method for retrieving documents
First Claim
1. A document retrieval device that retrieves a document matching a retrieval condition inputted thereto, comprising:
- document information storage means for storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file;
retrieval condition acquisition means for receiving the retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator;
matching document retrieval means for retrieving plural documents matching the retrieval condition received by the retrieval condition acquisition means, out of the documents stored in the document information storage means and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition;
related word calculation means for acquiring, as related words, the words stored in the document information storage means that are associated with plural matching documents retrieved by the matching document retrieval means, and calculating, with regard to each of the related words, a degree of relatedness among the matching documents, based on an expression with variables that includes a number of the documents containing the related word among the matching documents and a number of the documents containing the related word among the documents stored in the document information storage means;
related part extraction means for extracting a plurality of related parts from contents of the document, on the basis of the related word and the degree of relatedness which are acquired by the related word calculation means; and
related part output means for outputting the related parts acquired by the related part extraction means and displaying them in order of appearance in the documents along with a score representing a relative degree of relatedness.
1 Assignment
0 Petitions
Accused Products
Abstract
The retrieval condition acquisition unit receives a logical operation expression of the keywords as a retrieval condition from a user. The matching document retrieval unit acquires a list of the document IDs corresponding to the keywords inputted, from the word index of the document information storage unit, and applies a specified logical operation to the result to acquire the matching document IDs. The related keyword calculation unit acquires the keywords extracted from the matching documents retrieved by the matching document retrieval unit as the related keywords, and calculates the degrees of relatedness of each of the matching documents. The related part extraction unit accumulates the degrees of relatedness of the related keywords as to each of the matching documents, and extracts the sentences with the appearance orders in the text kept, in the order of the sentence having a higher accumulated value, until the total length of the sentences extracted becomes longer than a predetermined length. The related part output unit displays the sentences acquired by the related part extraction unit as a retrieval result to the user. Thus, the document retrieval device of the invention extracts the related parts of documents that meet the retrieval intention of the user.
26 Citations
16 Claims
-
1. A document retrieval device that retrieves a document matching a retrieval condition inputted thereto, comprising:
-
document information storage means for storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file; retrieval condition acquisition means for receiving the retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator; matching document retrieval means for retrieving plural documents matching the retrieval condition received by the retrieval condition acquisition means, out of the documents stored in the document information storage means and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition; related word calculation means for acquiring, as related words, the words stored in the document information storage means that are associated with plural matching documents retrieved by the matching document retrieval means, and calculating, with regard to each of the related words, a degree of relatedness among the matching documents, based on an expression with variables that includes a number of the documents containing the related word among the matching documents and a number of the documents containing the related word among the documents stored in the document information storage means; related part extraction means for extracting a plurality of related parts from contents of the document, on the basis of the related word and the degree of relatedness which are acquired by the related word calculation means; and related part output means for outputting the related parts acquired by the related part extraction means and displaying them in order of appearance in the documents along with a score representing a relative degree of relatedness. - View Dependent Claims (2, 3, 4)
-
-
5. A document retrieval device that retrieves a document related to a retrieval condition inputted thereto, comprising:
-
document information storage means for storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file; retrieval condition acquisition means for receiving the retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator; related word calculation means for specifying a related word whose degree of relatedness is to be judged among the words stored in the document information storage means associated with documents that match the retrieval condition, and calculating the degree of relatedness among the matching documents, based on an expression that includes a number of the documents containing the related word among matching documents and a number of the documents containing the related word among the documents stored in the document information storage means; related document retrieval means for retrieving a document related to the retrieval condition received by the retrieval condition acquisition means, out of the documents stored in the document information storage means, on the basis of the related word and the degree of relatedness which are acquired by the related word calculations means and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition; related part extraction means for extracting a plurality of related parts from contents of the related document acquired by the related document retrieval means, on the basis of the related word and the degree of relatedness which are acquired by the related word calculation means; and related part output means for outputting the related parts acquired by the related part extraction means and displaying them in order of appearance in the documents along with a score representing a relative degree of relatedness. - View Dependent Claims (6, 7)
-
-
8. A document retrieval device, comprising:
-
document information storage means for storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file; retrieval condition acquisition means for receiving a retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator; matching document retrieval means for retrieving plural documents matching the retrieval condition received by the retrieval condition acquisition means, out of the documents stored in the document information storage means and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition; related word calculation means for acquiring, as related words, the words stored in the document information storage means that are associated with the plural matching documents retrieved by the matching document retrieval means, calculating, with regard to each of the related words, a degree of relatedness among the matching documents, based on an expression with variables that includes, a number of the documents containing the related word among the matching documents and a number of the documents containing the related word among the documents stored in the document information storage means, and acquiring the related word and the degree of relatedness; and related part extraction means for extracting a plurality of related parts from contents of the document, wherein a related part from contents of the document are plural sentences, the device further including; means for allocating a score to each of plural sentences constituting an input document, in accordance with a specific evaluation criterion, the score indicating a relative degree of relatedness; means for sequentially extracting the sentences on the basis of the scores; means for terminating the extraction of the sentences, when an accumulated quantity of the extracted sentences exceeds a specific quantity criterion; and means for outputting the extracted sentences in a form of an output document. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A document retrieval device, comprising:
-
document information storage means for storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file; retrieval condition acquisition means for receiving a retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator; document retrieval means for executing a retrieval of the documents stored in the document information storage means by using the retrieval condition, and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition; related word calculation means for acquiring, as related words, the words stored in the document information storage means that are associated with the matching documents of the retrieval condition, and calculating, with regard to each of the related words, a degree of relatedness among the matching documents, based on an expression that includes at least one of a number of the documents containing the related word among the matching documents and a number of the documents containing the related word among the documents stored in the document information storage means; related part extraction means for extracting a plurality of related parts from contents of the retrieved documents, on the basis of the related word and the degree of relatedness which are acquired by the related word calculation means; and related part output means for outputting the related parts acquired by the related part extraction means and displaying them in order of appearance in the documents along with a score representing a relative degree of relatedness.
-
-
15. A document retrieval method, comprising the steps of:
-
storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file; receiving a retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator; executing a retrieval of the stored documents by using the retrieval condition, and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition; acquiring, as related words, the words stored in association with the matched documents of the retrieval condition, and calculating, with regard to each of the related words, a degree of relatedness among the matching documents, based on an expression that includes at least one of a number of the documents containing the related word among the matching documents and a number of the documents containing the related word among the stored documents; extracting a plurality of related parts from contents of a retrieved document, on the basis of the related word and the degree of relatedness; and outputting the related parts extracted and displaying them in order of appearance in the documents along with a score representing a relative degree of relatedness.
-
-
16. A recording medium readable by a computer, the medium storing a program of instructions for causing the computer to execute a function, comprising:
-
storing, in a table, plural documents each in association with words extracted from the document, the document having a document ID identifying a storage location in the storage means of a document file; receiving a retrieval condition that includes at least a word given by a user as a retrieval condition in combination with a logical operator; executing a retrieval of the stored documents by using the retrieval condition, and applying a specified logical operation to the result to acquire the document ID of the documents that match the retrieval condition; acquiring, as related words, the words stored in association with plural matched documents, and calculating, with regard to each of the related words, a degree of relatedness among the matching documents, based on an expression that includes at least one of a number of the documents containing the related word among the matching documents and a number of the documents containing the related word among the stored documents; extracting a plurality of related parts from contents of a retrieved document, on the basis of the related word and the degree of relatedness; and outputting the related parts extracted and displaying them in order of appearance in the documents along with a score representing a relative degree of relatedness.
-
Specification