Search apparatus, search program, and search method
First Claim
1. A search apparatus for searching for a document corresponding to a specified search term from among a plurality of documents, the search apparatus comprising:
- an indexing term storage unit for storing an index database having a plurality of character strings each of which is composed of successive characters in each of a plurality of documents with a length equal to a predetermined number of index characters, and is used as an indexing term for that document;
a search term pair generation unit for generating at least one search term pair including a first search string with a length equal to the number of index characters and a second search string with a length equal to the number of index characters located at a position shifted by a predetermined number of offset characters relative to the first search string, which are included in the search term;
a search unit for searching the index database for a document having both of the first search string and the second search string as indexing terms, for each of the search term pairs;
a calculation unit for calculating a score indicating a degree at which each document is included in a search result on the basis of a frequency of occurrence of the first search string and the second search string of each of the search term pairs included in each document or of whether or not the first search string and the second search string are included in each document;
a selection unit for selecting a document to be outputted as the search result from among the plurality of documents, on the basis of the respective scores of the plurality of documents; and
an output unit for outputting the document selected by the selection unit as the search result.
1 Assignment
0 Petitions
Accused Products
Abstract
The search apparatus of the present invention searches for a document corresponding to a specified search term from among a plurality of documents, and includes a search term pair generation unit for generating at least one search term pair including a first search string with a length equal to the number of index characters and a second search string with a length equal to the number of index characters located at a position shifted by a predetermined number of offset characters relative to the first search string, which are included in the search term, a search unit for searching, for each search term pair, an index database for a document which has both of the first search string and the second search string as indexing terms, a calculation unit for calculating a score of each document on the basis of a frequency of occurrence of the first search string and the second search string of each search term pair included in each document, and a selection unit for selecting a document to be outputted as a search result from among the plurality of documents, on the basis of the respective scores of the plurality of documents.
-
Citations
10 Claims
-
1. A search apparatus for searching for a document corresponding to a specified search term from among a plurality of documents, the search apparatus comprising:
-
an indexing term storage unit for storing an index database having a plurality of character strings each of which is composed of successive characters in each of a plurality of documents with a length equal to a predetermined number of index characters, and is used as an indexing term for that document; a search term pair generation unit for generating at least one search term pair including a first search string with a length equal to the number of index characters and a second search string with a length equal to the number of index characters located at a position shifted by a predetermined number of offset characters relative to the first search string, which are included in the search term; a search unit for searching the index database for a document having both of the first search string and the second search string as indexing terms, for each of the search term pairs; a calculation unit for calculating a score indicating a degree at which each document is included in a search result on the basis of a frequency of occurrence of the first search string and the second search string of each of the search term pairs included in each document or of whether or not the first search string and the second search string are included in each document; a selection unit for selecting a document to be outputted as the search result from among the plurality of documents, on the basis of the respective scores of the plurality of documents; and an output unit for outputting the document selected by the selection unit as the search result. - View Dependent Claims (2, 3, 4)
-
-
5. A search apparatus for searching for a document corresponding to a specified search term from among a plurality of documents, the search apparatus comprising:
-
an indexing term storage unit for storing an index database having a plurality of character strings each of which is composed of successive characters in each of a plurality of documents with a length equal to a predetermined number of index characters, and is used as an indexing term for that document; a generation unit for generating at least one search string with a length equal to the number of index characters, the search string being included in the search term; a partial search unit for searching the index database for a document which has each of the search strings as an indexing term; a calculation unit for calculating a score indicating a degree at which each document is included in a search result, on the basis of a frequency of occurrence of each of the search strings included in each document or of whether or not each of the search strings is included in each document; a threshold generation unit for generating a threshold of the score which becomes larger as the search term becomes longer, and which becomes smaller as the search term becomes shorter; a selection unit for selecting a document having the score higher than the threshold as a document to be outputted as the search result, from among the plurality of documents; and an output unit for outfitting the document selected by the selection unit. - View Dependent Claims (6, 7, 8)
-
-
9. A search program for searching for a document corresponding to a specified search term from among a plurality of documents by a computer, the search program causing the computer to function as:
-
an indexing term storage unit for storing an index database having a plurality of character strings each of which is composed of successive characters in each of a plurality of documents with a length equal to a predetermined number of index characters, and is used as an indexing term for that document; a search term pair generation unit for generating at least one search term pair including a first search string with a length equal to the number of index characters and a second search string with a length equal to the number of index characters located at a position shifted by a predetermined number of offset characters relative to the first search string, which are included in the search term; a search unit for searching the index database for a document having both of the first search string and the second search string as indexing terms, for each of the search term pairs; a calculation unit for calculating a score indicating a degree at which each document is included in a search result on the basis of a frequency of occurrence of the first search string and the second search string of each of the search term pairs included in each document or of whether or not the first search string and the second search string are included in each document; a selection unit for selecting a document to be outputted as the search result from among the plurality of documents, on the basis of the respective scores of the plurality of documents; and an output unit for outputting the document selected by the selection unit as the search result.
-
-
10. A search method for searching for a document corresponding to a specified search term from among a plurality of documents by a computer, wherein the computer stores an index database having a plurality of character strings each of which is composed of successive characters in each of a plurality of documents with a length equal to a predetermined number of index characters, and is used as an indexing term for that document, the method comprising:
-
a search term pair generation step of generating, by the computer, at least one search term pair including a first search string with a length equal to the number of index characters and a second search string with a length equal to the number of index characters located at a position shifted by a predetermined number of offset characters relative to the first search string, which are included in the search term; a search step of searching, by the computer, the index database for a document having both of the first search string and the second search string as indexing terms, for each of the search term pairs; a calculation step of calculating, by the computer, a score indicating a degree at which each document is included in a search result on the basis of a frequency of occurrence of the first search string and the second search string of each of the search term pairs included in each document or of whether or not the first search string and the second search string are included in each document; a selection step of selecting, by the computer, a document to be outputted as the search result, on the basis of the respective scores of the plurality of documents; and an output step of outputting the document selected by the selection step.
-
Specification