Approximate matching of strings for message filtering
First Claim
Patent Images
1. A method of identifying strings in an e-mail message, the method comprising:
- receiving an e-mail message; and
executing instructions stored in memory, wherein execution of the instructions by a processor;
identifies a text string in the e-mail message;
determines that the identified text string in the e-mail message is not a safe string, wherein safe strings are predetermined strings stored in a database of acceptable terms and identified as legitimately present in e-mail messages;
associates the text string with a guarded term from a database of guarded terms stored in memory, the guarded term being a string of special interest to a user;
evaluates a cost of the association of the identified text string that dictates a probability that the identified text string is a mutation of the associated guarded term, wherein the evaluation compares similarities and differences between the identified text string and the guarded term, and wherein the evaluation assigns different penalties for the cost based on whether the mutation includes regular characters or special characters;
matches the identified text string with the guarded term when the cost of association of the identified text string meets a predetermined threshold; and
characterizes the e-mail message based on the matching between the identified text string and the guarded term.
23 Assignments
0 Petitions
Accused Products
Abstract
A method of determining whether a guarded term is represented in a message comprises associating a portion of the message with the guarded term and evaluating a cost of the association. A method of generating a collection of guarded terms that represents an original term comprises generating a plurality of variations of the original term, evaluating similarity of each of the plurality of variations with respect to the original term and determining whether the similarity meets a predetermined criterion.
-
Citations
20 Claims
-
1. A method of identifying strings in an e-mail message, the method comprising:
-
receiving an e-mail message; and executing instructions stored in memory, wherein execution of the instructions by a processor; identifies a text string in the e-mail message; determines that the identified text string in the e-mail message is not a safe string, wherein safe strings are predetermined strings stored in a database of acceptable terms and identified as legitimately present in e-mail messages; associates the text string with a guarded term from a database of guarded terms stored in memory, the guarded term being a string of special interest to a user; evaluates a cost of the association of the identified text string that dictates a probability that the identified text string is a mutation of the associated guarded term, wherein the evaluation compares similarities and differences between the identified text string and the guarded term, and wherein the evaluation assigns different penalties for the cost based on whether the mutation includes regular characters or special characters; matches the identified text string with the guarded term when the cost of association of the identified text string meets a predetermined threshold; and characterizes the e-mail message based on the matching between the identified text string and the guarded term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for identifying strings in an e-mail message, the system comprising:
-
a processor; and memory storing a database of guarded terms and executable instructions, wherein the guarded terms are strings of special interest to a user, and wherein execution of the instructions by the processor; identifies a text string in the e-mail message, determines that the identified text string in the e-mail message is not a safe string, wherein safe strings are predetermined strings stored in a database of acceptable terms and identified as legitimately present in e-mail messages, associates the text string with a guarded term from the database of guarded terms, evaluates a cost of the association of the identified text string that dictates a probability that the identified text string is a mutation of the associated guarded term, wherein the evaluation compares similarities and differences between the identified text string and the guarded term, and wherein the evaluation assigns different penalties for the cost based on whether the mutation includes regular characters or special characters, matches the identified text string with the guarded term when the cost of association of the identified text string meets a predetermined threshold, and characterizes the e-mail message based on the matching between the identified text string and the guarded term. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A non-transitory computer-readable storage medium, having embodied thereon a program, the program being executable by a processor to perform a method for identifying strings in an e-mail message, the method comprising:
-
identifying a text string in the e-mail message; determining that the identified text string in the e-mail message is not a safe string, wherein safe strings are predetermined strings stored in a database of acceptable terms and identified as legitimately present in e-mail messages; associating the text string with a guarded term from a database of guarded terms stored in memory, the guarded term being a string of special interest to a user; evaluating a cost of the association of the identified text string that dictates a probability that the identified text string is a mutation of the associated guarded term, wherein the evaluation compares similarities and differences between the identified text string and the guarded term, and wherein the evaluation assigns different penalties for the cost based on whether the mutation includes regular characters or special characters; matching the identified text string with the guarded term when the cost of association of the identified text string meets a predetermined threshold; and characterizing the e-mail message based on the matching between the identified text string and the guarded term. - View Dependent Claims (20)
-
Specification