SEARCH APPARATUS, SEARCH METHOD, AND RECORDING MEDIUM STORING PROGRAM
First Claim
1. A search apparatus comprising:
- an abstract matrix storage unit that, when information which is created from a plurality of regions obtained by dividing a matrix representing a co-occurrence relationship between a word set and a document set and which also represents a subset included in the document set is provided, stores information which enables calculation or estimation of a frequency of a word in each of the to plurality of regions as abstract information;
a region upper limit calculation unit that, when the information representing the subset is input, examines a relationship between the information representing the subset and the plurality of regions, refers to the abstract information for each of the plurality of regions from the obtained result, and calculates, for each of the plurality of regions, an upper limit of the frequency of the word included in each of the plurality of regions for the subset;
a word frequency calculation unit that adds the upper limit of the frequency for each of the plurality of regions by each region with a common word, and specifies the obtained added value as the upper limit of the frequency of the word for each region with the common word; and
a document frequency reference unit that obtains a region to be searched according to the upper limit of the frequency of the word for each region with the common word, further specifies a specified number of words in order of higher frequency according to the obtained region to be searched, and outputs the specified word as a word characteristic to the subset.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided is a search apparatus, a search method, and a program that can improve search speed for a document set even when an object to be searched is a large-scale document set. A search apparatus is used which includes an abstract matrix storage unit 1 that, when information which is created from a plurality of regions of a matrix representing a co-occurrence relationship between a word set and a document set and which represents a subset of the document set is provided, stores abstract information which enables calculation of a frequency of a word in each region, a word frequency calculation unit 2 that, in response to an input of the information representing the subset, examines a relationship between this and each region, refers to the abstract information of each region, and calculates an upper limit of the frequency for the subset of the word by each region, a word frequency calculation unit 3 that adds the upper limit of the frequency for each word region, and specifies the added value as the upper limit of the frequency of the word for each word region, and a document frequency reference unit 4 that obtains a region to be searched from the upper limit of the frequency of the word for each word region, and specifies the specified number of words in order of higher frequency according to this region.
23 Citations
16 Claims
-
1. A search apparatus comprising:
-
an abstract matrix storage unit that, when information which is created from a plurality of regions obtained by dividing a matrix representing a co-occurrence relationship between a word set and a document set and which also represents a subset included in the document set is provided, stores information which enables calculation or estimation of a frequency of a word in each of the to plurality of regions as abstract information; a region upper limit calculation unit that, when the information representing the subset is input, examines a relationship between the information representing the subset and the plurality of regions, refers to the abstract information for each of the plurality of regions from the obtained result, and calculates, for each of the plurality of regions, an upper limit of the frequency of the word included in each of the plurality of regions for the subset; a word frequency calculation unit that adds the upper limit of the frequency for each of the plurality of regions by each region with a common word, and specifies the obtained added value as the upper limit of the frequency of the word for each region with the common word; and a document frequency reference unit that obtains a region to be searched according to the upper limit of the frequency of the word for each region with the common word, further specifies a specified number of words in order of higher frequency according to the obtained region to be searched, and outputs the specified word as a word characteristic to the subset. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A search method comprising:
-
(a) when information which is created from a plurality of regions obtained by dividing a matrix representing a co-occurrence relationship between a word set and a document set and which also represents a subset included in the document set is provided, storing information which enables calculation or estimation of a frequency of a word in each of the plurality of regions as abstract information; (b) when the information representing the subset is input, examining a relationship between the information representing the subset and the plurality of regions, referring to the abstract information for each of the plurality of regions from the obtained result, and calculating, for each of the plurality of regions, an upper limit of the frequency of the word included in each of the plurality of regions for the subset; (c) adding the upper limit of the frequency for each of the plurality of regions by each region with the common word, and specifying the obtained added value as the upper limit of the frequency of the word for each region with the common word; and (d) obtaining a region to be searched according to the upper limit of the frequency of the word for each region with the common word, further specifying a specified number of words in order of higher frequency according to the obtained region to be searched, and outputting the specified word as a word characteristic to the subset. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A recording medium storing a program for causing a computer to execute:
-
(a) a process that, when information which is created from a plurality of regions obtained by dividing a matrix representing a co-occurrence relationship between a word set and a document set and which also represents a subset included in the document set is provided, stores information which enables calculation or estimation of a frequency of a word in each of the plurality of regions as abstract information; (b) a process that, when the information representing the subset is input, examines a relationship between the information representing the subset and the plurality of regions, refers to the abstract information for each of the plurality of regions from the obtained result, and calculates, for each of the plurality of regions, an upper limit of the frequency of the word included in each of the plurality of regions for the subset; (c) a process that adds the upper limit of the frequency for each of the plurality of regions by each region with the common word, and specifies the obtained added value as the upper limit of the frequency of the word by each region with the common word; and (d) a process that obtains a region to be searched according to the upper limit of the frequency of the word for each region with the common word, further specifies a specified number of words in order of higher frequency according to the obtained region to be searched, and outputs the specified word as a word characteristic to the subset. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A search apparatus comprising:
-
an abstract matrix storage that, when information which is created from a plurality of regions obtained by dividing a matrix representing a co-occurrence relationship between a word set and a document set and which also represents a subset included in the document set is provided, stores information which enables calculation or estimation of a frequency of a word in each of the plurality of regions as abstract information; a region upper limit calculation that, when the information representing the subset is input, examines a relationship between the information representing the subset and the plurality of regions, refers to the abstract information for each of the plurality of regions from the obtained result, and calculates, for each of the plurality of regions, an upper limit of the frequency of the word included in each of the plurality of regions for the subset; a word frequency calculation that adds the upper limit of the frequency for each of the plurality of regions by each region with a common word, and specifies the obtained added value as the upper limit of the frequency of the word for each region with the common word; and a document frequency reference that obtains a region to be searched according to the upper limit of the frequency of the word for each region with the common word, further specifies a specified number of words in order of higher frequency according to the obtained region to be searched, and outputs the specified word as a word characteristic to the subset.
-
Specification