×

Method and apparatus for document clustering and document sketching

  • US 7,433,869 B2
  • Filed: 06/29/2006
  • Issued: 10/07/2008
  • Est. Priority Date: 07/01/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for computing the sketch for a document, comprising the steps of:

  • using a sentence in a document as a logical delimiter or window from which significant words are extracted based upon semantics of each word in the sentence and each word'"'"'s relationship to other words in the sentence;

    computing a weight for said extracted words;

    extracting the top-k of said words based on their weight in the document, wherein k represents a numerical value;

    lexicographically sorting words in a phrase to capture content of the sentence before computing a sketch;

    computing a hash of all pair-wise permutations for said significant words;

    sorting said computed hashes; and

    choosing the top-m hashes to represent the document, wherein m represents a numerical value.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×