Detecting quoted text

US 7,222,299 B1
Filed: 12/19/2003
Issued: 05/22/2007
Est. Priority Date: 12/19/2003
Status: Active Grant

First Claim

Patent Images

1. A method for detecting quoted text within a document, the method comprising:

generating a first set of hash values for a first sequence of words within a first document;

generating a second set of hash values for a second sequence of words within a second document;

comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and

identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;

wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises;

identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and

identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for detecting quoted text within a document, such as an email message or email thread, is described. A text comparison is performed to identify a block of quoted text within the document. The boundaries of the block of quoted text are identified by performing a character-by-character analysis on text surrounding the identified block of quoted text. The block of quoted text is elided so that an individual can easily identify the block of quoted text as having previously been viewed.

142 Citations

39 Claims

1. A method for detecting quoted text within a document, the method comprising:
- generating a first set of hash values for a first sequence of words within a first document;
  
  generating a second set of hash values for a second sequence of words within a second document;
  
  comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and
  
  identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises;
  
  identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and
  
  identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 further comprising eliding the block of quoted text from the first document.
  - 3. The method of claim 1 further comprising highlighting the block of quoted text within the first document.
  - 4. The method of claim 3 wherein the block of quoted text is highlighted by causing the block of quoted text to be displayed in a color different from other text within the first document.
  - 5. The method of claim 3 wherein the block of quoted text is highlighted by causing the block of quoted text to be indented within the first document.
  - 6. The method of claim 1 wherein the first document includes an email thread.
  - 7. The method of claim 6 further comprising identifying an email header associated with block of quoted text.
  - 8. The method of claim 7 wherein the email header is identified by scanning text preceding the block of quoted text for particular attribution strings.
  - 9. The method of claim 1 wherein the first set of hash values is generated using a rolling checksum function.
  - 10. The method of claim 1 wherein the first set of hash values is generated only from letters or digits found within the first document.
  - 11. The method of claim 1 wherein the first set of hash values is generated using N sequential words within the first document.
  - 12. The method of claim 1 wherein at least a portion of the block of quoted text is identified by merging two previously identified blocks of quoted text into a single block of quoted text.

13. A method for detecting quoted text within a document, the method comprising:
- generating a first set of hash values for a first sequence of words within a first document;
  
  generating a second set of hash values for a second sequence of words within a second document;
  
  comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and
  
  identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  wherein generating the first set of hash values comprises generating a first plurality of hash values for a plurality of overlapping subsequences of the first sequence of words;
  
  wherein generating the second set of hash values comprises generating a second plurality of hash values for a plurality of overlapping subsequences of the second sequence of words; and
  
  wherein comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document comprises;
  
  identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and
  
  identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.

14. A system for detecting quoted text, comprising:
- a hashing module to generate a first set of hash values for a first sequence of words within a first document and to generate a second set of hash values for a second sequence of words within a second document; and
  
  a comparator module to compare the first set of hash values to the second set of hash values to identify matching hash values, which correspond to at least a portion of a block of quoted text within the first document, and to identify additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  wherein the comparison module is configured to identify a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold, and to identify text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 15. The system of claim 14, further comprising a text elider module to elide the block of quoted text from the first document.
  - 16. The system of claim 14, further comprising a text highlighter to highlight the block of quoted text within the first document.
  - 17. The system of claim 16, wherein the block of quoted text is highlighted by causing the block of quoted text to be displayed in a color different from other text within the first document.
  - 18. The system of claim 16, wherein the block of quoted text is highlighted by causing the block of quoted text to be indented within the first document.
  - 19. The system of claim 14, wherein the first document includes an email thread.
  - 20. The system of claim 19, wherein the comparator module is configured to identify an email header associated with block of quoted text.
  - 21. The system of claim 20, wherein the comparator module is configured to identify the email header by scanning text preceding the block of quoted text for particular attribution strings.
  - 22. The system of claim 14, wherein the hash module is configured to generate the first set of hash values using a rolling checksum function.
  - 23. The system of claim 14, wherein the hash module is configured to generate the first set of hash values only from letters or digits found within the first document.
  - 24. The system of claim 14, wherein the hash module is configured to generate the first set of hash values using N sequential words within the document.
  - 25. The system of claim 14, wherein the comparison module is configured to identify at least a portion of the block of quoted text by merging two previously identified blocks of quoted text into a single block of quoted text.

26. A system for detecting quoted text, comprising:
- a hashing module to generate a first set of hash values for a first sequence of words within a first document and to generate a second set of hash values for a second sequence of words within a second document; and
  
  a comparator module to compare the first set of hash values to the second set of hash values to identify matching hash values, which correspond to at least a portion of a block of quoted text within the first document, and to identify additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  a hashing module to generate a first set of hash values for a first sequence of words within a first document and to generate a second set of hash values for a second sequence of words within a second document; and
  
  a comparator module to compare the first set of hash values to the second set of hash values to identify matching hash values, which correspond to at least a portion of a block of quoted text within the first document, and to identify additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  wherein the hash module is configured to generate the first set of hash values by generating a first plurality of hash values for a plurality of overlapping subsequences of the first sequence of words, and to generate the second set of hash values by generating a second plurality of hash values for a plurality of overlapping subsequences of the second sequence of words; and
  
  wherein the comparison module is configured to identify a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold, and to identify text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.

27. A computer program product embodied on a computer readable medium for enabling a detection of quoted text within a first message, the computer program product comprising computer instructions for:
- generating a first set of hash values for a first sequence of words within a first document;
  
  generating a second set of hash values for a second sequence of words within a second document;
  
  comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and
  
  identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  wherein the instructions for comparing include instructions for;
  
  identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and
  
  identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 28. The computer program product of claim 27, further comprising computer instructions for eliding the block of quoted text from the first document.
  - 29. The computer program product of claim 27, further comprising computer instructions for highlighting the block of quoted text within the first document.
  - 30. The computer program product of claim 29, including computer instructions for highlighting the block of quoted text by causing the block of quoted text to be displayed in a color different from other text within the first document.
  - 31. The computer program product of claim 29, including computer instructions for highlighting the block of quoted text by causing the block of quoted text to be indented within the first document.
  - 32. The computer program product of claim 27, wherein the first document includes an email thread.
  - 33. The computer program product of claim 32, further comprising identifying an email header associated with block of quoted text.
  - 34. The computer program product of claim 33, including computer instructions for identifying the email header by scanning text preceding the block of quoted text for particular attribution strings.
  - 35. The computer program product of claim 27, including computer instructions for generating the first set of hash values using a rolling checksum function.
  - 36. The computer program product of claim 27, including computer instructions for generating the first set of hash values only from letters or digits found within the first document.
  - 37. The computer program product of claim 27, including computer instructions for generating the first set of hash values using N sequential words within the first document.
  - 38. The computer program product of claim 27, including computer instructions for identifying at least a portion of the block of quoted text by merging two previously identified blocks of quoted text into a single block of quoted text.

39. A computer program product embodied on a computer readable medium for enabling a detection of quoted text within a first message, the computer program product comprising computer instructions for:
- generating a first set of hash values for a first sequence of words within a first document;
  
  generating a second set of hash values for a second sequence of words within a second document;
  
  comparing the first set of hash values to the second set of hash values to identify matching hash values corresponding to at least a portion of a block of quoted text within the first document; and
  
  identifying additional portions, on a character-by-character basis, of the block of quoted text by comparing additional text contiguous to the first sequence of words to additional text contiguous to the second sequence of words;
  
  wherein the instructions for the generating the first and second sets of hash values include instructions for generating the first set of hash values by generating a first plurality of hash values for a plurality of overlapping subsequences of the first sequence of words, and generating the second set of hash values by generating a second plurality of hash values for a plurality of overlapping subsequences of the second sequence of words; and
  
  wherein the instructions for comparing include instructions for;
  
  identifying a first sequence of hash values of the first set of hash values that match a second sequence of hash values of the second set of hash values, wherein a length of the first sequence is above a predefined threshold; and
  
  identifying text within the first document corresponding to the first sequence of hash values as a first portion of the block of quoted text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Buchheit, Paul, Lim, Jing Yee
Primary Examiner(s)
Huynh; Cong-Lac

Application Number

US10/740,994
Time in Patent Office

1,250 Days
Field of Search

715/500, 715/530, 715/531, 715/752
US Class Current

715/273
CPC Class Codes

G06F 40/117 Tagging; Marking up details...

G06Q 10/00 Administration; Management

Detecting quoted text

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

142 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting quoted text

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

142 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links