SEARCHING APPARATUS AND SEARCHING METHOD
First Claim
1. A searching apparatus comprising:
- a memory unit which stores, for each of n-grams (where n is a natural number) extracted from plural pieces of document data subjected to searching, a transposed index representing an appearing position in the plural pieces of document data and an appearing frequency therein, the n-gram being a character string including n number of characters;
an n-gram extracting unit that extracts all n-grams which are extractable from a searching character string;
a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and which derives an n-gram with a smallest appearing frequency among all of the n-grams extracted by the n-gram extracting unit;
a searching n-gram selecting unit that selects, from all of the n-grams extracted by the n-gram extracting unit, a plurality of searching n-grams which form the searching character string and which include the n-gram with the smallest appearing frequency derived by the smallest-frequency deriving unit; and
a document specifying unit that specifies, based on the plurality of searching n-grams selected by the searching n-gram selecting unit and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document data.
1 Assignment
0 Petitions
Accused Products
Abstract
A searching apparatus includes a memory unit which stores transposed indexes representing appearing positions of all n-grams in plural pieces of document data subjected to searching and appearing frequencies, an n-gram extracting unit that extracts all n-grams extractable from a searching character string, a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and derives an n-gram with the smallest appearing frequency among all of the extracted n-grams, a searching n-gram selecting unit that selects, from all extracted n-grams, a plurality of searching n-grams which form the searching character string and include the n-gram with the smallest appearing frequency, and a document specifying unit that specifies, based on the plurality of selected searching n-grams and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document data.
-
Citations
15 Claims
-
1. A searching apparatus comprising:
-
a memory unit which stores, for each of n-grams (where n is a natural number) extracted from plural pieces of document data subjected to searching, a transposed index representing an appearing position in the plural pieces of document data and an appearing frequency therein, the n-gram being a character string including n number of characters; an n-gram extracting unit that extracts all n-grams which are extractable from a searching character string; a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and which derives an n-gram with a smallest appearing frequency among all of the n-grams extracted by the n-gram extracting unit; a searching n-gram selecting unit that selects, from all of the n-grams extracted by the n-gram extracting unit, a plurality of searching n-grams which form the searching character string and which include the n-gram with the smallest appearing frequency derived by the smallest-frequency deriving unit; and a document specifying unit that specifies, based on the plurality of searching n-grams selected by the searching n-gram selecting unit and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A searching method using a searching apparatus which stores, for each of n-grams (where n is a natural number) extracted from plural pieces of document data subjected to searching, a transposed index representing an appearing position in the plural pieces of document data and an appearing frequency therein, the n-gram being a character string including n number of characters, the searching method comprising:
-
an n-gram extracting step of extracting all n-grams which are extractable from a searching character string; a smallest-frequency deriving step of referring to the appearing frequency of the n-gram represented by the transposed index, and of deriving an n-gram with a smallest appearing frequency among all of the n-grams extracted through the n-gram extracting step; a searching n-gram selecting step of selecting, from all of the n-grams extracted through the n-gram extracting step, a plurality of searching n-grams which form the searching character string and which include the n-gram with the smallest appearing frequency derived through the smallest-frequency deriving step; and a document specifying step of specifying, based on the plurality of searching n-grams selected through the searching n-gram selecting step and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A recording medium storing a computer program that allows a computer including a memory unit which stores, for each of n-grams (where n is a natural number) extracted from plural pieces of document data subjected to searching, a transposed index representing an appearing position in the plural pieces of document data and an appearing frequency therein, the n-gram being a character string including n number of characters to function as;
-
an n-gram extracting unit that extracts all n-grams which are extractable from a searching character string; a smallest-frequency deriving unit which refers to the appearing frequency of the n-gram represented by the transposed index, and which derives an n-gram with a smallest appearing frequency among all of the n-grams extracted by the n-gram extracting unit; a searching n-gram selecting unit that selects, from all of the n-grams extracted by the n-gram extracting unit, a plurality of searching n-grams which form the searching character string and which include the n-gram with the smallest appearing frequency derived by the smallest-frequency deriving unit; and a document specifying unit that specifies, based on the plurality of searching n-grams selected by the searching n-gram selecting unit and the appearing position of the searching n-gram represented by the transposed index, document data including the searching character string among the plural pieces of document data. - View Dependent Claims (12, 13, 14, 15)
-
Specification