×

Techniques for computing similarity measurements between segments representative of documents

  • US 8,166,049 B2
  • Filed: 05/28/2009
  • Issued: 04/24/2012
  • Est. Priority Date: 05/29/2008
  • Status: Active Grant
First Claim
Patent Images

1. In a system for navigating a document repository in which each document in the document repository comprises at least one segment, a method for computing similarity measurements between various ones of a plurality of segments comprising:

  • populating a matrix representative of the plurality of segments in which each segment of the plurality of segments is represented by keyword frequency data spanning a plurality of keywords, the matrix comprising a plurality of sub-matrices in which each sub-matrix of the plurality of sub-matrices corresponds to a non-overlapping portion of the plurality keywords;

    identifying a first keyword of the plurality of keywords as being a synonym of a second keyword of the plurality of keywords; and

    adding first keyword frequency data corresponding to the first keyword to second keyword frequency data corresponding to the second keyword to provide modified second keyword frequency data;

    for each sub-matrix of the plurality of sub-matrices, calculating a sub-matrix dot product between a first segment of the plurality of segments and a second segment of the plurality of segments, the sub-matrix dot product spanning at least a portion of the non-overlapping portion of the plurality of keywords, to provide a plurality of sub-matrix dot products, wherein the plurality of sub-matrix dot products comprise dot products of the modified second keyword frequency data and the first keyword frequency data; and

    summing the plurality of sub-matrix dot products to provide a similarity measurement between the first segment and the second segment.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×