Document retrieval method and system and computer readable storage medium
First Claim
1. A document retrieval method for a text database which stores document information as character code data, comprising:
- a document registration step of extracting a predetermined substring and location information of said substring in registration-target text data from said registration-target text data to store said substring and said location information as a location information file;
a document retrieval step of extracting a predetermined substring from a query expression and extracting a retrieving substring from a part or all of said predetermined substring to calculate a similarity by using location information of said retrieving substring acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between contents of said query expression and contents of a text in said text database; and
an information-of-important-string display step of selecting an important substring from said extracted retrieving substring based on said calculated similarity to display information of said important substring among information used for calculating said similarity.
1 Assignment
0 Petitions
Accused Products
Abstract
A document retrieval system is provided which has a document display interface which is easy to recognize the important portions even if a document retrieved by using a query expression designated by a document or a long sentence is displayed. When a text is registered, predetermined character strings and location information which are extracted from the text are stored in a location information file. A weight of each character string is calculated by a predetermined method and is stored in a weight file. In retrieving a document, predetermined character strings are extracted from a designated query expression. A similarity is calculated between the query expression and texts in the database by using the location information and the weights acquired from the location file and the weight file. In displaying the document, character strings having the high weights are extracted from the character strings used for the retrieval. Then, the display format of a portion which contains the extracted character strings is changed to display the text.
-
Citations
12 Claims
-
1. A document retrieval method for a text database which stores document information as character code data, comprising:
-
a document registration step of extracting a predetermined substring and location information of said substring in registration-target text data from said registration-target text data to store said substring and said location information as a location information file;
a document retrieval step of extracting a predetermined substring from a query expression and extracting a retrieving substring from a part or all of said predetermined substring to calculate a similarity by using location information of said retrieving substring acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between contents of said query expression and contents of a text in said text database; and
an information-of-important-string display step of selecting an important substring from said extracted retrieving substring based on said calculated similarity to display information of said important substring among information used for calculating said similarity. - View Dependent Claims (3, 4, 5)
said document retrieval step includes: a string-for-showing-information selection step of extracting predetermined substrings from said query expression to make a user select a substring, information of which is displayed, from said predetermined substrings; and
a similarity calculation step of calculating a similarity by using said location information of said substrings acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between the contents of said query expression and the contents of said text in said text database;
said information-of-important-string display step includes;
a selected string acquisition step of acquiring said substring selected at said string-for-showing-information selection step; and
a information-of-selected-string display step of displaying a contribution factor to the calculation of said location information or said similarity of said substring extracted at said selected string acquisition step;
said important substring is said substring selected at said string-for-showing-information selection step; and
said information of said important substring includes said contribution factor to the calculation of said location information or said similarity of said important substring.
-
-
4. A document retrieval method according to claim 1, wherein
said document retrieval step includes: -
a substring editing step of extracting a predetermined substring from said query expression to add or delete a substring to and from said substring; and
a similarity-after-editing calculation step of acquiring said location information of said substring edited at said substring editing step from said location information file to calculate a similarity by using said location information and by a predetermined method, said similarity being a degree of similarity between the contents of said query expression and the contents of said text in said text database;
said information-of-important string display step includes;
an additive string acquisition step of acquiring said substring added at said substring editing step; and
an information-of-additive-string display step of displaying a contribution factor to the calculation of said location information or said similarity of said substring acquired at said additive string acquisition step;
said important substring is said substring added at said substring editing step; and
said information of said important substring includes the contribution factor to the calculation of said location information or said similarity of said important substring.
-
-
5. A document retrieval method according to claim 1, wherein
said information-of-important-string display step includes: -
a string-accounting-for-some-of-similarity extraction step of extracting a predetermined number of substrings from said substrings in the order of the contribution to the calculation of said similarity; and
a information-string-accounting-for-some-of-similarity display step of displaying a contribution factor to the calculation of said location information or said similarity of said substring extracted at said string-accounting-for-some-of-similarity extraction step;
said important substrings are a predetermined number of substrings extracted in the order of the contribution to the calculation of said similarity at said string-accounting-for-some-of-similarity extraction step; and
said information of said important substring includes the contribution factor to the calculation of said location information or said similarity of each important substring.
-
-
2. A document retrieval method according to claim 2, wherein
said information-of-important-string display step includes: -
a very important string extraction step of calculating a degree of importance of said substring by a predetermined method to extract a predetermined number of substrings in a descending order of said degree of importance; and
an information-of-very-important-string display step of displaying a contribution factor to the calculation of said location information or said similarity of said substring extracted at said very important string extraction step;
said important substrings are a predetermined number of substrings extracted in the descending order of the degree of importance at said very important string extraction step; and
said information of said important substrings includes the contribution factor to the calculation of said location information or said similarity of said important substring.
-
-
6. A document retrieval method for a text database which stores document information as character code data, comprising:
-
a document registration step of extracting a predetermined substring and location information of said substring in registration-target text data from said registration-target text data to store said substring and said location information as a location information file;
a document retrieval step of extracting a predetermined substring from a query expression and extracting a retrieving substring from a part or all of said predetermined substring to calculate a similarity by using location information of said retrieving substring acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between contents of said query expression and contents of a text in said text database; and
a document display step of displaying a text designated by a user among texts whose similarities are calculated by said document retrieval step, wherein said document display step includes a display character format change step of selecting an important substring from said extracted retrieving substring based on said calculated similarity to change a display character format of a portion which contains said important substring in said designated text. - View Dependent Claims (7, 8, 9, 10)
said document registration step include: a location information file generation step of extracting a predetermined substring and said location information of said substring in said registration-target text data from said registration-target text data to store said substring and said location information as said location information file; and
a degree-of-importance file generation step of calculating a degree of importance of said substring by a predetermined method to store said degree of importance as a degree-of-importance file;
said document display step includes;
an important string extraction step of acquiring said degree of importance of said substring from said degree-of-importance file to extract a predetermined number of substrings in a descending order of said acquired degree of importance; and
an important-string display-format change step of changing a display format of a portion which contains said substring extracted at said important string extraction step in said designated text; and
said important string is said predetermined number of substrings extracted in the descending order of said degree of importance at said important string extraction step.
-
-
8. A document retrieval method according to claim 6, wherein
said document retrieval step include: -
a string-for-changing-display-format selection step of extracting predetermined substrings from said query expression to make a user select a substring, a display format of which is changed, from said predetermined substrings; and
a similarity calculation step of calculating a similarity by using said location information of said substring acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between the contents of said query expression and the contents of said text in said text database;
said document display step includes;
a selected-string acquisition step of acquiring said substring selected by said string-for-changing-display-format selection step; and
a selected-string display-format change step of changing a display format of a portion which contains said substring acquired by said selected-string acquisition step in said designated text; and
said important substring is said substring selected at said string-for-changing-display-format selection step.
-
-
9. A document retrieval method according to claim 6, wherein
said document retrieval step includes: -
a substring editing step of extracting a predetermined substring from said query expression to add or delete a substring to and from said predetermined substring; and
a similarity-after-editing calculation step of acquiring said location information of said substring edited at said substring editing step from said location information file to calculate a similarity by using said location information and by a predetermined method, said similarity being a degree of similarity between the contents of said query expression and the contents of said text in said text database;
said document display step includes;
an additive-string acquisition step of acquiring said substring added at said substring editing step; and
an additive-string display-format change step of changing a display format of a portion which contains said substring acquired at said additive-string acquisition step in said designated text; and
said important substring is said substring added at said additive-string editing step.
-
-
10. A document retrieval method according to claim 6, wherein
said document display step includes: -
a string-accounting-for-some-of-similarity extraction step of extracting a predetermined number of substrings from said substrings in the order of the contribution to the calculation of said similarity; and
a string-accounting-for-some-of-similarity display-format change step of changing a display format of a portion which contains said substring extracted at said string-accounting-for-some-of-similarity extraction step in said designated text; and
said important substring is said predetermined number of substrings extracted at said string-accounting-for-some-of-similarity extraction step in the order of the contribution to the calculating of said similarity.
-
-
11. A document retrieval system for a text database which stores document information as character code data, comprising:
-
document registration means for extracting a predetermined substring and location information of said substring in a registration-target text data from said registration-target text data to store said substring and said location information as a location information file;
document retrieval means for extracting a predetermined substring from a query expression and extracting a retrieving substring from a part or all of said predetermined substring to calculating a similarity by using location information of said retrieving substring acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between contents of said query expression and contents of a text in said text database; and
document display means for displaying a text designated by a user among texts whose similarities are calculated by said document retrieval means, wherein said document display means includes a display character format change means for selecting an important substring from said extracted retrieving substring based on said calculated similarity to change a display character format of a portion which contains said important substring in said designated text.
-
-
12. A storage medium for a text database which stores document information as character code data, and for storing a program for configuring a document retrieval system, comprising:
-
a document registration module for extracting a predetermined substring and location information of said substring in a registration-target text data from said registration-target text data to store said substring and said location information as a location information file;
a document retrieval module for extracting a predetermined substring from a query expression and extracting a retrieving substring from a part or all of said predetermined substring to calculate a similarity by using location information of said retrieving substring acquired from said location information file and by a predetermined method, said similarity being a degree of similarity between contents of said query expression and contents of a text in said text database; and
a document display module for displaying a text designated by a user among texts whose similarities are calculated by said document retrieval module, wherein said document display module includes a display character format change module for selecting an important substring from said extracted retrieving substring based on said calculated similarity to change a display character format of a portion which contains said important substring in said designated text.
-
Specification