×

Method and vector analysis for a document

  • US 8,171,026 B2
  • Filed: 04/16/2009
  • Issued: 05/01/2012
  • Est. Priority Date: 11/20/2000
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for determining similarity between two input documents, comprising:

  • detecting terms that occur in each of said input documents;

    segmenting each of said input documents into document segments, each segment being a predetermined part of said input document;

    generating document segment vectors, each vector including as its element values according to occurrence frequencies of said terms occurring in said respective document segments, where, a n-th document segment vector for a first one of said input documents Sn (n=1, . . . , N) is represented by (sn1, sn2, sn3, . . . , snk) and a m-th document segment vector for a second one of said input documents Tm (m=1, . . . , M) is represented by (tm1, tm2, tm3, . . . , tmk), where Sni represents the occurrence frequency of an i-th term in a n-th document segment, and tmi represents the occurrence frequency of an i-th term in a m-th document segment;

    calculating by a processing device, for each of the two input documents, a squared inner product for all combinations of said document segment vectors contained in each input document, where the squared inner product is represented by
    SntTm

    k=1KSnkTmk; and

    determining said similarity between the two input documents based on a sum of said squared inner products.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×