×

Document similarity calculation apparatus, clustering apparatus, and document extraction apparatus

  • US 7,451,139 B2
  • Filed: 10/28/2002
  • Issued: 11/11/2008
  • Est. Priority Date: 03/07/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A clustering apparatus comprising:

  • a memory;

    a Central Processing Unit;

    a similarity calculation unit which respectively calculates a similarity as a relative value between documents, with respect to combinations of a plurality of documents, using a document vector and a significance of a word included in a document;

    a conversion unit which converts similarity calculated by the similarity calculation unit to an absolute value by normalization; and

    a clustering unit which executes clustering of a plurality of documents, based on similarity of the absolute value;

    wherein the absolute value is a sum of a ratio between a similarity having a highest value and a similarity to be converted and a ratio between a mean value of similarities and the similarity to be converted, or the absolute value is a ratio between the similarity having the highest value among the similarities not be converted and the similarity to be converted, said normalization being carried out in accordance with a following equation;

    normalized similarity=α

    ×

    (similarity of target document/similarity of document of first place)+β

    ×

    (similarity of target document/mean value of the similarities), wherein α and

    β

    are coefficients, and wherein α

    is 0 and β

    is 1, and a number of higher ranking documents is 1, the above equation can be expressed as following equation;

    Normalized similarity=(similarity of target document/highest similarity in documents other than relevant document), and wherein a result of said normalization identifying at least one of said plurality of documents relative to the relevant document is output.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×