System for determining degrees of similarity in email message information
First Claim
1. A method for classifying email messages, the method comprising:
- providing multiple independent modules each of which is configured to analyze email messages;
receiving an email message;
using a plurality of the independent modules to determine a level of sameness of the received email message with one or more prior email messages, wherein each module being used determines a level of sameness in a different manner than the other modules being used, and wherein at least some of the modules being used are each assigned a non-zero weight indicative of the module'"'"'s performance level;
determining an overall level of sameness for the received email message by combining results of at least two of the plurality of independent modules using the non-zero weights assigned to the modules;
evaluating the performance level for each of the independent modules that were used to determine the level of sameness for the received email message;
comparing the performance levels evaluated for the independent modules that were used to determine the level of sameness for the received email message;
adjusting the non-zero weights of at least two of the modules in response to comparing the performance levels, including increasing the non-zero weight of at least one of the modules and reducing the non-zero weight of at least another one of the modules; and
using the overall level of sameness determined for the received email message to classify the received email message into a category.
1 Assignment
0 Petitions
Accused Products
Abstract
Similarity of email message characteristics is used to detect bulk and spam email. A determination of “sameness” for purposes of both bulk and spam classifications can use any number and type of evaluation modules. Each module can include one or more rules, tests, processes, algorithms, or other functionality. For example, one type of module may be a word count of email message text. Another module can use a weighting factor based on groups of multiple words and their perceived meanings. In general, any type of module that performs a similarity analysis can be used. A preferred embodiment of the invention uses statistical analysis, such as Bayesian analysis, to measure the performance of different modules against a known standard, such as human manual matching. Modules that are performing worse than other modules can be valued less than modules having better performance. In this manner, a high degree of reliability can be achieved. To improve performance, if a message is determined to be the same as a previous message, the previous computations and results for that previous message can be re-used. Users can be provided with options to customize or regulate bulk and spam classification and subsequent actions on how to handle the classified email messages.
-
Citations
28 Claims
-
1. A method for classifying email messages, the method comprising:
-
providing multiple independent modules each of which is configured to analyze email messages; receiving an email message; using a plurality of the independent modules to determine a level of sameness of the received email message with one or more prior email messages, wherein each module being used determines a level of sameness in a different manner than the other modules being used, and wherein at least some of the modules being used are each assigned a non-zero weight indicative of the module'"'"'s performance level; determining an overall level of sameness for the received email message by combining results of at least two of the plurality of independent modules using the non-zero weights assigned to the modules; evaluating the performance level for each of the independent modules that were used to determine the level of sameness for the received email message; comparing the performance levels evaluated for the independent modules that were used to determine the level of sameness for the received email message; adjusting the non-zero weights of at least two of the modules in response to comparing the performance levels, including increasing the non-zero weight of at least one of the modules and reducing the non-zero weight of at least another one of the modules; and using the overall level of sameness determined for the received email message to classify the received email message into a category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. An apparatus for classifying email messages, the apparatus comprising
a processor for executing instructions included in a machine-readable medium, the machine-readable medium including: -
one or more instructions for providing multiple independent modules each of which is configured to analyze email messages; one or more instructions for receiving an email message; one or more instructions for using a plurality of the independent modules to determine a level of sameness of the received email message with one or more prior email messages, wherein each module being used determines a level of sameness in a different manner than the other modules being used, and wherein at least some of the modules being used are each assigned a non-zero weight indicative of the module'"'"'s performance level; one or more instructions for determining an overall level of sameness for the received email message by combining results of at least two of the plurality of independent modules using the non-zero weights assigned to the modules; one or more instructions for evaluating the performance level for each of the independent modules that were used to determine the level of sameness for the received email message; one or more instructions for comparing performance levels evaluated for the independent modules that were used to determine the level of sameness for the received email message; one or more instructions for adjusting the non-zero weights of at least in response to comparing the performance levels, including increasing the non-zero weight of at least one of the modules and reducing the non-zero weight of at least another one of the modules; and one or more instructions for using the overall level of sameness determined for the received email message to classify the received email message into a category.
-
-
27. A machine-readable storage medium including instructions executable by a processor for classifying email messages, the machine-readable storage medium including:
-
one or more instructions for providing multiple independent modules each of which is configured to analyze email messages; one or more instructions for receiving an email message; one or more instructions for using a plurality of the independent modules to determine a level of sameness of the received email message with one or more prior email messages, wherein each module being used determines a level of sameness in a different manner than the other modules being used, and wherein at least some of the modules being used are each assigned a non-zero weight indicative of the module'"'"'s performance level; one or more instructions for determining an overall level of sameness for the received email message by combining results of at least two of the plurality of independent modules using the non-zero weights assigned to the modules; one or more instructions for evaluating the performance level for each of the independent modules that were used to determine the level of sameness for the received email message; one or more instructions for comparing performance levels evaluated for the independent modules that were used to determine the level of sameness for the received email message; one or more instructions for adjusting the non-zero weights of at least in response to comparing the performance levels, including increasing the non-zero weight of at least one of the modules and reducing the non-zero weight of at least another one of the modules; and one or more instructions for using the overall level of sameness determined for the received email message to classify the received email message into a category.
-
-
28. An apparatus for classifying email messages, the apparatus comprising:
-
means for providing multiple independent modules each of which is configured to analyze email messages; means for receiving an email message; means for using a plurality of the independent modules to determine a level of sameness of the received email message with one or more prior email messages, wherein each module being used determines a level of sameness in a different manner than the other modules being used, and wherein at least some of the modules being used are each assigned a non-zero weight indicative of the module'"'"'s performance level; means for determining an overall level of sameness for the received email message by combining results of at least two of the plurality of independent modules using the non-zero weights assigned to the modules; means for evaluating the performance level for each of the independent modules that were used to determine the level of sameness for the received email message; means for comparing performance levels evaluated for the independent modules that were used to determine the level of sameness for the received email message; means for adjusting the non-zero weights of at least two of the modules in response to comparing the performance levels, including increasing the non-zero weight of at least one of the modules and reducing the non-zero weight of at least another one of the modules; and means for using the overall level of sameness determined for the received email message to classify the received email message into a category.
-
Specification