×

Search apparatus, search method, and non-transitory computer readable medium storing program that input a query representing a subset of a document set stored to a document database and output a keyword that often appears in the subset

  • US 8,892,574 B2
  • Filed: 11/06/2009
  • Issued: 11/18/2014
  • Est. Priority Date: 11/26/2008
  • Status: Active Grant
First Claim
Patent Images

1. A search apparatus including a central processing unit, comprising:

  • a cluster creation unit that creates a plurality of regions from a word document matrix specifying a co-occurrence relationship between a word set and a document set, by dividing the word set and the document set into a plurality of subsets;

    a region abstract creation unit that calculates, for each of the plurality of regions, a region frequency representing a number of documents including a word in each region, creates an abstract matrix specifying each region frequency for each region, and stores the created abstract matrix into an abstract matrix storage unit;

    a region upper limit calculation unit that, when information representing at least one subset of the plurality of subsets is input, examines a relationship between the information representing the at least one subset of the plurality of subsets and the plurality of regions, refers to abstract information for each of the plurality of regions from the obtained result of the relationship, and calculates, for each of the plurality of regions, an upper limit value of the frequency of the word included in each of the plurality of regions for the at least one subset of the plurality of subsets;

    a word frequency calculation unit that sums the upper limit value of the frequency of the word for each region with a common word in the plurality of regions, and specifies the summed value as the upper limit value of the frequency of the word for each region with the common word; and

    a document frequency reference unit that determines a region to be searched in the plurality of regions according to the upper limit value of the frequency of the word for each region with the common word, and further specifies a top number of words with high frequency according to the determined region to be searched, and outputs the specified top number of words with high frequency as characteristic words in the at least one subset of the plurality of subsets.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×