Searching apparatus and searching method
First Claim
1. A searching apparatus comprising:
- a memory unit which stores, for each of a plurality of n-grams (where n is a natural number) extracted from plural pieces of document data subjectable to searching, a transposed index representing appearing positions in the plural pieces of document data and an appearing frequency, wherein each n-gram is a character string including n number of characters;
an n-gram extracting unit which extracts all n-grams which are extractable from a searching character string;
a smallest-frequency deriving unit which refers to the appearing frequencies of the plurality of n-grams represented by the transposed indexes, and which derives an n-gram with a smallest appearing frequency among all of the n-grams extracted by the n-gram extracting unit;
a searching n-gram selecting unit which;
(a) divides the searching character string n-gram by n-gram from a first character of the searching character string so that one n-gram does not overlap with another n-gram, and selects the divided n-grams from among all of the n-grams extracted by the n-gram extracting unit,(b) additionally selects an n-gram including a last character of the searching character string when the selected n-grams do not form the searching character string, and(c) additionally selects the n-gram with the smallest appearing frequency when the n-gram with the smallest appearing frequency derived by the smallest-frequency deriving unit is not included in the n-grams selected through (a) and (b), anda document specifying unit which specifies, based on the plurality of searching n-grams selected by the searching n-gram selecting unit and based on the appearing positions of the searching n-grams represented by the transposed indexes, document data including the searching character string among the plural pieces of document data.
1 Assignment
0 Petitions
Accused Products
Abstract
A searching apparatus includes a memory unit which stores transposed indexes representing appearing positions of all n-grams in plural pieces of document data subjected to searching and appearing frequencies, an n-gram extracting unit that extracts all n-grams extractable from a searching character string, a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and derives an n-gram with the smallest appearing frequency among all of the extracted n-grams, a searching n-gram selecting unit that selects, from all extracted n-grams, a plurality of searching n-grams which form the searching character string and include the n-gram with the smallest appearing frequency, and a document specifying unit that specifies, based on the plurality of selected searching n-grams and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document data.
-
Citations
8 Claims
-
1. A searching apparatus comprising:
-
a memory unit which stores, for each of a plurality of n-grams (where n is a natural number) extracted from plural pieces of document data subjectable to searching, a transposed index representing appearing positions in the plural pieces of document data and an appearing frequency, wherein each n-gram is a character string including n number of characters; an n-gram extracting unit which extracts all n-grams which are extractable from a searching character string; a smallest-frequency deriving unit which refers to the appearing frequencies of the plurality of n-grams represented by the transposed indexes, and which derives an n-gram with a smallest appearing frequency among all of the n-grams extracted by the n-gram extracting unit; a searching n-gram selecting unit which; (a) divides the searching character string n-gram by n-gram from a first character of the searching character string so that one n-gram does not overlap with another n-gram, and selects the divided n-grams from among all of the n-grams extracted by the n-gram extracting unit, (b) additionally selects an n-gram including a last character of the searching character string when the selected n-grams do not form the searching character string, and (c) additionally selects the n-gram with the smallest appearing frequency when the n-gram with the smallest appearing frequency derived by the smallest-frequency deriving unit is not included in the n-grams selected through (a) and (b), and a document specifying unit which specifies, based on the plurality of searching n-grams selected by the searching n-gram selecting unit and based on the appearing positions of the searching n-grams represented by the transposed indexes, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (2)
-
-
3. A searching apparatus comprising:
-
a memory unit which stores, for each of a plurality of n-grams (where n is a natural number) extracted from plural pieces of document data subjectable to searching, a transposed index representing appearing positions in the plural pieces of document data and an appearing frequency, wherein each n-gram is a character string including n number of characters; an n-gram extracting unit which extracts all n-grams which are extractable from a searching character string; a smallest-frequency deriving unit which refers to the appearing frequencies of the plurality of n-grams represented by the transposed indexes, and which derives an n-gram with a smallest appearing frequency among all of the n-grams extracted by the n-gram extracting unit; a searching n-gram selecting unit which; (a) selects an n-gram including a first or last character of the searching character string among all of the n-grams extracted by the n-gram extracting unit, (b) additionally selects the n-gram with the smallest appearing frequency derived by the smallest-frequency deriving unit, and (c) divides the searching character string n-gram by n-gram so that one n-gram does not overlap with another n-gram with reference to a position of the n-gram with the smallest appearing frequency in the searching character string in a direction frontward or backward of that position, and additionally selects a divided n-gram not selected through (a); and a document specifying unit which specifies, based on the plurality of searching n-grams selected by the searching n-gram selecting unit and based on the appearing positions of the searching n-grams represented by the transposed indexes, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (4)
-
-
5. A searching method using a searching apparatus which stores, for each of a plurality of n-grams (where n is a natural number) extracted from plural pieces of document data subjectable to searching, a transposed index representing appearing positions in the plural pieces of document data and an appearing frequency, wherein each n-gram is a character string including n number of characters, the searching method comprising:
-
an n-gram extracting step of extracting all n-grams which are extractable from a searching character string; a smallest-frequency deriving step of referring to the appearing frequencies of the plurality of n-grams represented by the transposed indexes, and deriving an n-gram with a smallest appearing frequency among all of the n-grams extracted through the n-gram extracting step; a searching n-gram selecting step of; (a) dividing the searching character string n-gram by n-gram from a first character of the searching character string so that one n-gram does not overlap with another n-gram, and selecting the divided n-grams from among all of the n-grams extracted through the n-gram extracting step, (b) additionally selecting an n-gram including a last character of the searching character string when the selected n-grams do not form the searching character string, and (c) additionally selecting the n-gram with the smallest appearing frequency when the n-gram with the smallest appearing frequency derived through the smallest-frequency deriving step is not included in the n-grams selected through (a) and (b); and a document specifying step of specifying, based on the plurality of searching n-grams selected through the searching n-gram selecting step and based on the appearing positions of the searching n-grams represented by the transposed indexes, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (6)
-
-
7. A searching method using a searching apparatus which stores, for each of a plurality of n-grams (where n is a natural number) extracted from plural pieces of document data subjectable to searching, a transposed index representing appearing positions in the plural pieces of document data and an appearing frequency, wherein each n-gram is a character string including n number of characters, the searching method comprising:
-
an n-gram extracting step of extracting all n-grams which are extractable from a searching character string; a smallest-frequency deriving step of referring to the appearing frequencies of the plurality of n-grams represented by the transposed indexes, and deriving an n-gram with a smallest appearing frequency among all of the n-grams extracted through the n-gram extracting step; a searching n-gram selecting step of; (a) selecting an n-gram including a first or last character of the searching character string among all of the n-grams extracted through the n-gram extracting step, (b) additionally selecting the n-gram with the smallest appearing frequency derived through the smallest-frequency deriving step, and (c) dividing the searching character string n-gram by n-gram so that one n-gram does not overlap with another n-gram with reference to a position of the n-gram with the smallest appearing frequency in the searching character string in a direction frontward or backward of that position, and additionally selecting a divided n-gram not selected through (a); and a document specifying step of specifying, based on the plurality of searching n-grams selected through the searching n-gram selecting step and based on the appearing positions of the searching n-grams represented by the transposed indexes, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (8)
-
Specification