Origination/destination features and lists for spam prevention
First Claim
1. A system that facilitates extracting data in connection with spam processing, comprising:
- a memory;
a processor coupled to the memory;
a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact, respond or receive in connection with the message, wherein the component that receives the item determines a last trusted server IP address to distinguish between legitimate and fake prepended server IP addresses and the last trusted server IP address is extracted as a feature from the item;
and a component that employs a subset of the extracted features in connection with building a filter by adding the subset of the extracted features to a training set of data utilized for training and updating the filter, wherein the filter determines a probability that the message is spam when the subset of the extracted features passes through the filter.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention involves a system and method that facilitate extracting data from messages for spam filtering. The extracted data can be in the form of features, which can be employed in connection with machine learning systems to build improved filters. Data associated with origination information as well as other information embedded in the body of the message that allows a recipient of the message to contact and/or respond to the sender of the message can be extracted as features. The features, or a subset thereof, can be normalized and/or deobfuscated prior to being employed as features of the machine learning systems. The (deobfuscated) features can be employed to populate a plurality of feature lists that facilitate spam detection and prevention. Exemplary features include an email address, an IP address, a URL, an embedded image pointing to a URL, and/or portions thereof.
213 Citations
20 Claims
-
1. A system that facilitates extracting data in connection with spam processing, comprising:
-
a memory; a processor coupled to the memory; a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact, respond or receive in connection with the message, wherein the component that receives the item determines a last trusted server IP address to distinguish between legitimate and fake prepended server IP addresses and the last trusted server IP address is extracted as a feature from the item; and a component that employs a subset of the extracted features in connection with building a filter by adding the subset of the extracted features to a training set of data utilized for training and updating the filter, wherein the filter determines a probability that the message is spam when the subset of the extracted features passes through the filter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method that facilitates extracting data in connection with spam processing, comprising:
-
receiving a message;
extracting a set of features associated with an origination of the message or part thereof and/or information that enables an intended recipient to contact, respond or receive in connection with the message;
wherein a last trusted server IP address is an extracted feature from the message and a determination is made to distinguish between legitimate and fake prepended server IP addresses; andemploying a subset of the extracted features in connection with building a filter by adding the subset of the extracted features to a training set of data utilized for training and updating the filter, wherein the filter determines a probability of the message being spam when the subset of the extracted features passes through the filter. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A system that facilitates extracting data in connection with spam processing, comprising:
-
a memory; a processor coupled to the memory; a means for receiving a message; a means for extracting a set of features associated with an origination of the message or part thereof and/or information that enables an intended recipient to contact, respond or receive in connection with the message; the means for extracting the set of features that determines a last trusted server IP address that distinguishes between legitimate and fake prepended server IP addresses, wherein the last trusted server IP address is extracted as a feature; and a means for employing a subset of the extracted features in connection with building a filter, wherein the filter determines a probability of the message being spam.
-
Specification