×

Apparatus for retrieving similar documents and apparatus for extracting relevant keywords

  • US 6,671,683 B2
  • Filed: 06/28/2001
  • Issued: 12/30/2003
  • Est. Priority Date: 06/28/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A similar document retrieving apparatus applicable to a document database D which stores N document data containing a total of M kinds of keywords and is machine processible, for designating a retrieval condition consisting of a document group including at least one document x1, - - - , xr selected from said document database D and for retrieving documents similar to said document group of said retrieval condition from said document database D, said similar document retrieving apparatus comprising:

  • keyword frequency-of-occurrence calculating means for calculating a keyword frequency-of-occurrence data F which represents a frequency-of-occurrence fdt of each keyword t appearing in each document d stored in said document database D;

    document length calculating means for calculating a document length data L which represents a length ld of said each document d;

    keyword weight calculating means for calculating a keyword weight data W which represents a weight wt of each keyword t of said M kinds of keywords appearing in said document database D;

    document profile vector producing means for producing a M-dimensional document profile vector Pd having components respectively representing a relative frequency-of-occurrence pdt of each keyword t in the concerned document d;

    document principal component analyzing means for performing a principal component analysis on a document profile vector group of a document group in said document database D and for obtaining a predefined (K)-dimensional document feature vector Ud corresponding to said document profile vector Pd for said each document d; and

    similar document retrieving means for receiving said retrieval condition consisting of the document group including at least one document x1, - - - , xr selected from said document database D, calculating a similarity between each document d and said retrieval condition based on a document feature vector of said received document group and the document feature vector of each document d in said document database D, and outputting a designated number of similar documents in order of the calculated similarity.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×