×

Information processing apparatus , method, and computer-readable recording medium for performing full text retrieval of documents

  • US 8,180,781 B2
  • Filed: 05/28/2009
  • Issued: 05/15/2012
  • Est. Priority Date: 05/28/2008
  • Status: Active Grant
First Claim
Patent Images

1. An information processing apparatus for creating a retrieval result displaying a list of retrieval documents, comprising:

  • a computer memory that stores a feature word file database configured to register, for each of a plurality of stored documents, document identification identifying the document, feature words extracted from full text data of the document, and weight values indicating weights of the feature words in which the feature words and the weight values are corresponded to the document identification;

    a computer processor;

    a document retrieval part, executable by the computer processor, configured to retrieve the retrieval documents, from among the plurality of stored documents, corresponding to a retrieval condition by conducting a full text retrieval of documents;

    a document scoring part, executable by the computer processor, configured to order the retrieval documents by scores indicating degrees of relevance to the retrieval condition;

    a document grouping part, executable by the computer processor, configured to group the retrieval documents into a plurality of groups based on an average rate of change of all the scores such that the groups are divided at a point where a difference in respective scores between two retrieval documents is greater than the average rate of change of all the scores; and

    a document clustering part, executable by the computer processor, configured to conduct a clustering process with respect to the retrieval documents based on the feature words and the weight values of the feature words acquired from the feature word file database, by using the document identifications of the retrieval documents as keys,wherein the average rate of change of the scores indicates a clustering accuracy, and the document clustering part conducts the clustering process with respect to the retrieval documents in a group, for each of the plurality of groups to which the retrieval documents are grouped by the document grouping part;

    wherein the feature words are extracted based on first values indicating appearance frequencies of words obtained from the full text data and second values indicating appearance frequencies of morpheme occurrences obtained when a morphological analysis is conducted, wherein a morpheme is a smallest semantically meaningful unit in a language, and the morphological analysis analyzing behavior and combination of morphemes.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×