×

Detecting quoted text

  • US 7,222,299 B1
  • Filed: 12/19/2003
  • Issued: 05/22/2007
  • Est. Priority Date: 12/19/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method for detecting quoted text within a document, the method comprising:

  • generating a first set of hash values for a first sequence of words within a first document;

    generating a second set of hash values for a second sequence of words within a second document;

    comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and

    identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;

    wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises;

    identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and

    identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×