SYSTEM AND METHOD FOR DETECTING CONTENT SIMILARITY WITHIN EMAILS DOCUMENTS EMPLOYING SELECTIVE TRUNCATION
First Claim
1. A method, comprising:
- generating a first token value dependent on a first subset of characters at a beginning portion of a first email document;
generating a second token value dependent on a second subset of characters at an ending portion of the first email document;
depending upon the first and second token values, selectively generating one or more hash values corresponding to a sequence of characters between the first subset and the second subset;
generating a third token value dependent on a third subset of characters at a beginning portion of a second email document;
generating a fourth token value dependent on a fourth subset of characters at an ending portion of the second email document;
depending upon the third and fourth token values, selectively generating one or more hash values corresponding to a sequence of characters between the third subset and the fourth subset; and
comparing the one or more hash values corresponding to the sequence of characters between the first subset and the second subset with the one or more hash values corresponding to the sequence of characters between the third subset and the fourth subset.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and a method for detecting content similarities in different emails employing selective truncation are disclosed. In one embodiment, a method comprises generating a first token value dependent on a first subset of characters at a beginning portion of a first email document, generating a second token value dependent on a second subset of characters at an ending portion of a first email document, and depending upon the first and second token values, selectively generating one or more hash values corresponding to a sequence of characters between the first subset and the second subset. The method further comprises generating a third token value dependent on a third subset of characters at a beginning portion of a second email document, generating a forth token value dependent on a forth subset of characters at an ending portion of a second email document, depending upon the first and second token values, and selectively generating one or more hash values corresponding to a sequence of characters between the first subset and the second subset. The method finally comprises comparing the one or more hash values corresponding to the sequence of characters between the first subset and the second subset with the one or more hash values corresponding to the sequence of characters between the third subset and the fourth subset.
17 Citations
20 Claims
-
1. A method, comprising:
-
generating a first token value dependent on a first subset of characters at a beginning portion of a first email document; generating a second token value dependent on a second subset of characters at an ending portion of the first email document; depending upon the first and second token values, selectively generating one or more hash values corresponding to a sequence of characters between the first subset and the second subset; generating a third token value dependent on a third subset of characters at a beginning portion of a second email document; generating a fourth token value dependent on a fourth subset of characters at an ending portion of the second email document; depending upon the third and fourth token values, selectively generating one or more hash values corresponding to a sequence of characters between the third subset and the fourth subset; and comparing the one or more hash values corresponding to the sequence of characters between the first subset and the second subset with the one or more hash values corresponding to the sequence of characters between the third subset and the fourth subset. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-readable memory medium, storing program instructions that are computer-executable to:
-
generate a first token value dependent on a first subset of characters at a beginning portion of a first email document; generate a second token value dependent on a second subset of characters at an ending portion of the first email document; depending upon the first and second token values, selectively generate one or more hash values corresponding to a sequence of characters between the first subset and the second subset; generate a third token value dependent on a third subset of characters at a beginning portion of a second email document; generate a fourth token value dependent on a fourth subset of characters at an ending portion of the second email document; depending upon the third and fourth token values, selectively generate one or more hash values corresponding to a sequence of characters between the third subset and the fourth subset; and compare the one or more hash values corresponding to the sequence of characters between the first subset and the second subset with the one or more hash values corresponding to the sequence of characters between the third subset and the fourth subset. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A system, comprising:
-
one or more processors; and memory storing program instructions that are executable by the one or more processors to; generate a first token value dependent on a first subset of characters at a beginning portion of a first email document; generate a second token value dependent on a second subset of characters at an ending portion of the first email document; depending upon the first and second token values, selectively generate one or more hash values corresponding to a sequence of characters between the first subset and the second subset; generate a third token value dependent on a third subset of characters at a beginning portion of a second email document; generate a fourth token value dependent on a fourth subset of characters at an ending portion of the second email document; depending upon the third and fourth token values, selectively generate one or more hash values corresponding to a sequence of characters between the third subset and the fourth subset; and compare the one or more hash values corresponding to the sequence of characters between the first subset and the second subset with the one or more hash values corresponding to the sequence of characters between the third subset and the fourth subset. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification