Information search method, information search device, and storage medium for storing an information search program
First Claim
1. A method for identifying at least one unique character string in an input document which is input into a computer system, said computer system being operable to search one or more documents which are searchably stored in a storage medium, and said unique character string is used as a search string, the method comprising:
- associating and managing position information for a position in said searchably stored documents where one or more partial comparison document character strings are extracted from said searchably stored documents;
extracting a partial input character string from said input document, and determining whether said partial input character string is a candidate character string;
identifying a partial comparison document character string which matches at least a part of said candidate character string with a predetermined similarity factor or higher;
identifying position data associated with said partial comparison document character string which matches with said predetermined similarity factor or higher; and
recognizing said candidate character string as the unique character string by comparing appearance frequency information of at least a part of said candidate character string appearing in said input document with the position data and evaluating an amount of feature of said candidate character string.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and a method for searching does not rely on prior compiled vocabulary information or grammatical information to perform a search. The search may accommodate new words or phrases, and perform a document search using a request of a user for document search. A unique character string is extracted from an input document and a similarity search is performed by using the unique character string. The extraction of the unique character string is performed by calculating and evaluating an amount of feature of a character string through comparison between appearance frequency appearing in the input document and appearance frequency in a set of documents to be searched. Then, the extracted unique character string is used as the basis for the search. Documents found by the search are evaluated and arranged in the order of evaluation. The similarity factor of document is evaluated by using the appearance frequency of each unique character string in the input document so that higher evaluation is provided to a document in which unique character strings with higher weight appears many times.
296 Citations
19 Claims
-
1. A method for identifying at least one unique character string in an input document which is input into a computer system, said computer system being operable to search one or more documents which are searchably stored in a storage medium, and said unique character string is used as a search string, the method comprising:
-
associating and managing position information for a position in said searchably stored documents where one or more partial comparison document character strings are extracted from said searchably stored documents; extracting a partial input character string from said input document, and determining whether said partial input character string is a candidate character string; identifying a partial comparison document character string which matches at least a part of said candidate character string with a predetermined similarity factor or higher; identifying position data associated with said partial comparison document character string which matches with said predetermined similarity factor or higher; and recognizing said candidate character string as the unique character string by comparing appearance frequency information of at least a part of said candidate character string appearing in said input document with the position data and evaluating an amount of feature of said candidate character string. - View Dependent Claims (2)
-
-
3. A method for searching for a document to be searched from one or more documents searchably stored in a computer, said document to be searched having a character string similar to a partial input character string existing in an input document input in the computer, the method comprising:
-
extracting a partial character string from said input document, and determining whether said partial character string is a candidate character string; evaluating an amount of feature of said candidate character string through comparison between appearance frequency information of at least a part of said candidate character string appearing in said input document and appearance frequency information of at least a part of said candidate character string appearing in said searchably stored documents to recognize said candidate character string as a unique character string; and searching for said document to be searched from said searchably stored documents, wherein said document to be searched has a character string similar to said unique character string. - View Dependent Claims (4)
-
-
5. A method for identifying at least one unique character string in an input document which is input into a computer system, said computer system being operable to search one or more documents stored in a storage medium, and using said unique character is used as a search string, the method comprising:
-
extracting a partial input character string from said input document, and determining whether said partial input character string is a candidate character string; and evaluating an amount of feature of said candidate character string through comparison between appearance frequency information of at least a part of said candidate character string appearing in said input document and appearance frequency information of at least a part of said candidate character string appearing in said searchably stored documents to recognize said candidate character string as said unique character string. - View Dependent Claims (6)
-
-
7. A method for evaluating similarity between a comparison document and an input document which contains a first unique character string and a second unique character string input in a computer, said computer system being operable to search for said comparison document stored in a storage medium, the method comprising:
-
calculating a first weight value corresponding to said first unique character string from appearance frequency information of at least a part of said first unique character string of said input document; calculating a second weight value corresponding to said second unique character string from appearance frequency information of at least a part of said second unique character string of said input document; calculating a first appearance frequency value of at least a part of said first unique character string appearing in said comparison document; calculating a second appearance frequency value of at least a part of said second unique character string appearing in said comparison document; and calculating a similarity factor between said input document and said comparison document from the first appearance frequency value taking said first weight value into account and the second appearance frequency value taking said second weight value into account.
-
-
8. A method for evaluating similarity between a comparison document and a unique character string input in a computer system, said computer system being operable to search for said comparison document stored in a storage medium, the method comprising:
-
calculating a weight value corresponding to said unique character string from appearance frequency information of at least a part of said unique character string appearing in an input document; and calculating a similarity factor between said unique character string and said comparison document from the appearance frequency information of at least a part of said unique character string appearing in said comparison document and said weight value.
-
-
9. An apparatus for identifying at least one unique character string in an input document which is input into a computer system, said computer system containing one or more documents which are searchably stored by the computer, the apparatus comprising:
-
a storage device for storing a position information file which associates and manages position information for a position in said searchably stored documents where one or more partial comparison document character strings are extracted from said searchably stored documents; means for extracting a candidate character string from said input document; means for identifying a partial comparison document character string which matches part of said candidate character string with a predetermined similarity factor or higher; means for identifying position data which is associated to said partial comparison document character string having the predetermined similarity factor or higher in said position information file; and means for recognizing said candidate character string as the unique character string by comparing appearance frequency information of at least a part of said candidate character string appearing in said input document with said position information, and evaluating an amount of feature of said candidate character string. - View Dependent Claims (10)
-
-
11. An apparatus for searching for a document to be searched from one or more documents searchably stored in a computer, said document to be searched having a character string similar to a partial input character string which exists in an input document input in the computer, the apparatus comprising:
-
an input device for identifying said input document and instructing execution of a search; means for detecting from said input device that said input document is identified and that said instruction of a search is input; means for extracting a candidate character string from said input document in response to the detection that said input document is identified and that said instruction of a search is input; means for calculating an amount of feature of said candidate character string through comparison between appearance frequency information at least a part of said candidate character string appearing in said input document and appearance frequency information of at least a part of said candidate character string appearing in said searchably stored documents; means for determining whether said candidate character string is a unique character string by evaluating said amount of feature; means for searching for the document to be searched from said searchably stored documents, wherein said document to be searched has a character string similar to said unique character string; and a display device for displaying the document to be searched having a character string similar to said unique character string. - View Dependent Claims (12)
-
-
13. An apparatus for identifying at least one unique character string in an input document which is input into a computer system, said computer system containing one or more documents which are searchably stored by the computer, and said unique character string is used as a search string, the apparatus comprising:
-
means for extracting a candidate character string from said input document; and means for determining whether said candidate character string is a unique character string by evaluating an amount of feature of said candidate character string through comparison between appearance frequency information of at least a part of said candidate character string appearing in said input document and appearance frequency information of at least a part of said candidate character string appearing in said searchably stored documents.
-
-
14. An apparatus for evaluating similarity between a comparison document and an input document containing a unique character string input into a computer system, said computer system containing a comparison document searchably stored by the computer, the apparatus comprising:
-
means for calculating a weight value corresponding to said unique character string from appearance frequency information of at least a part of said unique character string appearing in said input document; and means for calculating a similarity factor between said input document and said comparison document from the appearance frequency information of at least a part of said unique character string appearing in said comparison document and said weight value.
-
-
15. A storage medium readable by a computer for storing a program operable to identify a document input into a computer system based on an input document, said computer system containing one ore more documents which are searchably stored by the computer, the program comprising:
-
program code means for directing said computer to extract a partial character string from said input document and determining whether the partial character string is a candidate character string; and program code means for directing said computer to determine whether the candidate character string is a unique character string by evaluating an amount of feature of said candidate character string through comparison between appearance frequency information of at least a part of said candidate character string appearing in said input document and appearance frequency information of at least a part of said candidate character string appearing in said searchably stored documents. - View Dependent Claims (16, 17)
-
-
18. A storage medium readable by a computer for storing a program which is operable to evaluate similarity between a comparison document and an input document containing a unique character string input into a computer system, said comparison document being searchably stored by the computer, the program comprising:
-
program code means for directing said computer to calculate a weight value corresponding to said unique character string from appearance frequency information of at least a part of said unique character string appearing in said input document; and program code means for directing said computer to calculate a similarity factor between said input document said comparison document from the appearance frequency information of at least a part of said unique character string appearing in said comparison document and the weight value.
-
-
19. A medium readable by a computer for storing a program operable to identify at least one unique character string in an input document which is input into a computer system, said computer system being operable to search one or more documents which are searchably stored in a storage medium, and said unique character string is used as a search string, the program comprising:
-
program code means for associating and managing position information for a position in said searchably stored documents where one or more partial comparison document character strings are extracted from said searchably stored documents; program code means for extracting a partial input character string from said input document, and determining whether said partial input character string is a candidate character string; program code means for identifying a partial comparison document character string which matches at least a part of said candidate character string with a predetermined similarity factor or higher; program code means for identifying position data associated with said partial comparison document character string which matches with said predetermined similarity factor or higher; and program code means for recognizing said candidate character string as the unique character string by comparing appearance frequency information of at least a part of said candidate character string appearing in said input document with the position data and evaluating an amount of feature of said candidate character string.
-
Specification