Detecting quoted text
First Claim
Patent Images
1. A method for detecting quoted text within a document, the method comprising:
- generating a first set of hash values for a first sequence of words within a first document;
generating a second set of hash values for a second sequence of words within a second document;
comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and
identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises;
identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and
identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for detecting quoted text within a document, such as an email message or email thread, is described. A text comparison is performed to identify a block of quoted text within the document. The boundaries of the block of quoted text are identified by performing a character-by-character analysis on text surrounding the identified block of quoted text. The block of quoted text is elided so that an individual can easily identify the block of quoted text as having previously been viewed.
142 Citations
39 Claims
-
1. A method for detecting quoted text within a document, the method comprising:
-
generating a first set of hash values for a first sequence of words within a first document; generating a second set of hash values for a second sequence of words within a second document; comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises; identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for detecting quoted text within a document, the method comprising:
-
generating a first set of hash values for a first sequence of words within a first document; generating a second set of hash values for a second sequence of words within a second document; comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; wherein generating the first set of hash values comprises generating a first plurality of hash values for a plurality of overlapping subsequences of the first sequence of words; wherein generating the second set of hash values comprises generating a second plurality of hash values for a plurality of overlapping subsequences of the second sequence of words; and wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises; identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
-
-
14. A system for detecting quoted text, comprising:
-
a hashing module to generate a first set of hash values for a first sequence of words within a first document and to generate a second set of hash values for a second sequence of words within a second document; and a comparator module to compare the first set of hash values to the second set of hash values to identify matching hash values, which correspond to at least a portion of a block of quoted text within the first document, and to identify additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; wherein the comparison module is configured to identify a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold, and to identify text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for detecting quoted text, comprising:
-
a hashing module to generate a first set of hash values for a first sequence of words within a first document and to generate a second set of hash values for a second sequence of words within a second document; and a comparator module to compare the first set of hash values to the second set of hash values to identify matching hash values, which correspond to at least a portion of a block of quoted text within the first document, and to identify additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; a hashing module to generate a first set of hash values for a first sequence of words within a first document and to generate a second set of hash values for a second sequence of words within a second document; and a comparator module to compare the first set of hash values to the second set of hash values to identify matching hash values, which correspond to at least a portion of a block of quoted text within the first document, and to identify additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; wherein the hash module is configured to generate the first set of hash values by generating a first plurality of hash values for a plurality of overlapping subsequences of the first sequence of words, and to generate the second set of hash values by generating a second plurality of hash values for a plurality of overlapping subsequences of the second sequence of words; and wherein the comparison module is configured to identify a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold, and to identify text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
-
-
27. A computer program product embodied on a computer readable medium for enabling a detection of quoted text within a first message, the computer program product comprising computer instructions for:
-
generating a first set of hash values for a first sequence of words within a first document; generating a second set of hash values for a second sequence of words within a second document; comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; wherein the instructions for comparing include instructions for; identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A computer program product embodied on a computer readable medium for enabling a detection of quoted text within a first message, the computer program product comprising computer instructions for:
-
generating a first set of hash values for a first sequence of words within a first document; generating a second set of hash values for a second sequence of words within a second document; comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words; wherein the instructions for the generating the first and second sets of hash values include instructions for generating the first set of hash values by generating a first plurality of hash values for a plurality of overlapping subsequences of the first sequence of words, and generating the second set of hash values by generating a second plurality of hash values for a plurality of overlapping subsequences of the second sequence of words; and wherein the instructions for comparing include instructions for; identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
-
Specification