Advanced spam detection techniques
First Claim
1. A spam detection system comprising:
- a component that identifies features relating to at least a portion of origination information of a message; and
a component that combines the features into useful pairs for use in connection with training a machine learning filter to facilitate detecting spam.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject invention provides for an advanced and robust system and method that facilitates detecting spam. The system and method include components as well as other operations which enhance or promote finding characteristics that are difficult or the spammer to avoid and finding characteristics in non-spam that are difficult for spammers to duplicate. Exemplary characteristics include examining origination features in pairs analyzing character and/or number sequences, strings, and sub-strings, detecting various entropy levels of one or more character sequences, strings and/or sub-strings as well as analyzing message and/or feature sizes.
234 Citations
75 Claims
-
1. A spam detection system comprising:
-
a component that identifies features relating to at least a portion of origination information of a message; and
a component that combines the features into useful pairs for use in connection with training a machine learning filter to facilitate detecting spam. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A spam detection system comprising:
-
a component that analyzes a portion of a message via searching for particular character sequences that are indicative of spam, wherein the particular sequences are not restricted to whole words; and
a component that generates features relating to the character sequences of any length. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A spam detection system comprising:
a component that analyzes a portion of a message via searching for instances of a string of random characters that are indicative of the message being spam. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
-
33. A spam detection system comprising:
a component that analyzes substantially all features of a message header in connection with training a machine learning spam filter. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41)
-
42. A method that facilitates generating features for use in spam detection comprising:
-
receiving at least one message;
parsing at least a portion of a message to generate one or more features;
combining at least two features into pairs, whereby each pair of features creates at least one additional feature, the features of each pair coinciding with one another; and
using the pairs of features to train a machine learning spam filter. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 68)
-
-
53. A method that facilitates generating features for use in spam detection comprising:
-
receiving one or more messages;
walking through at least a portion of the message to create features for each run of characters of any run length; and
training a machine learning filter using at least a portion of the created features. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 69)
-
-
66. A method that facilitates generating features for use in spam detection comprising:
-
receiving one or more messages;
analyzing substantially all features of a message header; and
training a machine learning filter using the analyzed features. - View Dependent Claims (67)
-
-
70. A computer readable medium having, stored thereon the following computer executable components:
-
a component that identifies features relating to at least a portion of origination information of a message; and
a component that combines the features into useful pairs for use in connection with training a machine learning filter to facilitate detecting spam. - View Dependent Claims (71, 72)
-
-
73. A system that facilitates generating features for use in spam detection comprising:
-
a means for receiving at least one message;
a means for parsing at least a portion of a message to generate one or more features;
a means for combining at least two features into pairs, whereby each pair of features creates at least one additional feature, the features of each pair coinciding with one another; and
a means for using the pairs of features to train a machine learning spam filter.
-
-
74. A system that facilitates generating 1features for use in spam detection comprising:
-
a means for receiving one or more messages;
a means for walking through at least a portion of the message to create features for each run of characters of any run length; and
a means for training a machine learning filter using at least a portion of the created features. - View Dependent Claims (75)
-
Specification