Reliability of duplicate document detection algorithms
First Claim
Patent Images
1. A method, comprising:
- receiving an electronic message that is addressed to a user;
determining, by at least one processor, one or more attributes of the electronic message;
determining, by the at least one processor, an intersection between the determined one or more attributes of the electronic message and a first lexicon of attributes that are associated with spam electronic messages;
determining, by the at least one processor, whether the intersection exceeds a precision threshold, the precision threshold indicating the reliability of the intersection between the determined one or more attributes of the electronic message and the first lexicon of attributes; and
if the intersection exceeds the precision threshold;
determining an electronic message signature based on the intersection, andcomparing the electronic message signature to each of a plurality of signatures associated with spam electronic messages.
5 Assignments
0 Petitions
Accused Products
Abstract
In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold.
-
Citations
24 Claims
-
1. A method, comprising:
-
receiving an electronic message that is addressed to a user; determining, by at least one processor, one or more attributes of the electronic message; determining, by the at least one processor, an intersection between the determined one or more attributes of the electronic message and a first lexicon of attributes that are associated with spam electronic messages; determining, by the at least one processor, whether the intersection exceeds a precision threshold, the precision threshold indicating the reliability of the intersection between the determined one or more attributes of the electronic message and the first lexicon of attributes; and if the intersection exceeds the precision threshold; determining an electronic message signature based on the intersection, and comparing the electronic message signature to each of a plurality of signatures associated with spam electronic messages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system, comprising:
-
one or more processors; and a memory having instructions, that when executed by the one or more processors, cause the one or more processors to perform the operations of; receiving an electronic message that is addressed to a user; determining one or more attributes of the electronic message; determining an intersection between the determined one or more attributes and of the electronic message a first lexicon of attributes that are associated with spam electronic messages; determining whether the intersection exceeds a precision threshold, the precision threshold indicating the reliability of the intersection between the determined one or more attributes of the electronic message and the first lexicon of attributes; and if the intersection exceeds the precision threshold, determining an electronic message signature based on the intersection, and comparing the electronic message signature to each of a plurality of signatures associated with spam electronic messages. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium storing instructions, the instructions operable to cause one or more computer processors to perform operations, comprising:
-
receiving an electronic message that is addressed to a user; determining one or more attributes of the electronic message; determining an intersection between the determined one or more attributes of the electronic message and a first lexicon of attributes that are associated with spam electronic messages; determining whether the intersection exceeds a precision threshold, the precision threshold indicating the reliability of the intersection between the determined one or more attributes of the electronic message and the first lexicon of attributes; and if the intersection exceeds the precision threshold determining an electronic message signature based on the intersection, and comparing the electronic message signature to each of a plurality of signatures associated with spam electronic messages. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification