DOCUMENT RETRIEVAL SYSTEM AND DOCUMENT RETRIEVAL METHOD
First Claim
1. A document retrieval system, comprising:
- a document database for storing data on a plurality of documents;
indices used for indexing numeric values and texts in each of the documents stored in the document database, each of the indices used for indexing the text being a group of a term constituting the text and a frequency of the term in the document, each of the indices used for indexing the numeric value being a group of a label describing a feature represented by the numeric value, an interval including the numeric value, and a frequency of the numeric value in the document; and
an arithmetic unit for receiving a designation of a document as a retrieval input, calculating a similarity between the designated document and each of the documents stored in the document database by use of the indices, and presenting the documents in order of similarity.
1 Assignment
0 Petitions
Accused Products
Abstract
A document retrieval is performed with similarities between documents in numeric data taken into consideration. To this end, generated is a set E of intervals in which each element of a set D of numeric values representing a feature A is included in any one of the intervals. Each numeric value in each document is indexed by assigning, with 1, an interval including an element x of the set D, and with 0, an interval without the element x. Each document data including numeric values is indexed by indexing its text part with term frequencies, and by indexing its numeric-value part with the above-described numeric value indexing scheme. By use of indices thus created for each of the document data, similarities between the document data are calculated using a vector space model or a probability model, and the document data are presented in order of similarity.
29 Citations
13 Claims
-
1. A document retrieval system, comprising:
-
a document database for storing data on a plurality of documents; indices used for indexing numeric values and texts in each of the documents stored in the document database, each of the indices used for indexing the text being a group of a term constituting the text and a frequency of the term in the document, each of the indices used for indexing the numeric value being a group of a label describing a feature represented by the numeric value, an interval including the numeric value, and a frequency of the numeric value in the document; and an arithmetic unit for receiving a designation of a document as a retrieval input, calculating a similarity between the designated document and each of the documents stored in the document database by use of the indices, and presenting the documents in order of similarity. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A document retrieval method comprising the steps of:
-
receiving a designation of a document as a retrieval input; calculating a similarity between the document designated as the retrieval input and each of documents stored in a document database by use of indices of the designated document and indices of each document stored in the document database, the indices used for indexing numeric values and texts in a corresponding document, each of the indices used for indexing the text being a group of a term constituting the text and a frequency of the term in the corresponding document, each of the indices used for indexing the numeric value being a group of a label describing a feature represented by the numeric value, an interval including the numeric value, and a frequency of the numeric value in the corresponding document; and presenting the documents stored in the document database in order of similarity. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A document retrieval method comprising the steps of:
-
extracting a group of a feature and a numeric value from each of a plurality of document data stored in a document database; converting the extracted numeric value into an interval in accordance with a numeric conversion table, and then indexing the extracted numeric value with a group of the feature, the interval and a frequency, the numeric conversion table being dedicated to each feature type, and used for converting an numeric value into an interval; indexing each text in the document with a group of a term constituting the text and a frequency of the term in the document; calculating a similarity between document data designated as a retrieval input and each of the documents stored in the document database by use of data on the document indexed as above; and presenting the document data stored in the document database in order of similarity. - View Dependent Claims (12, 13)
-
Specification