Training filters for IP address and URL learning
First Claim
1. A system that facilitates spam detection comprising:
- a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message; and
a component that analyzes a subset of the extracted features in connection with building and employing a plurality of feature-specific filters that are independently trained to mitigate undue influence of at least one feature type over another in the message, the subset of extracted features comprising of at least one of a URL and an IP address, and the plurality of filters comprising at least a first feature-specific filter.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject invention provides for an intelligent quarantining system and method that facilitates detecting and preventing spam. In particular, the invention employs a machine learning filter specifically trained using origination features such as an IP address as well as destination feature such as a URL. Moreover, the system and method involve training a plurality of filters using specific feature data for each filter. The filters are trained independently each other, thus one feature may not unduly influence another feature in determining whether a message is spam. Because multiple filters are trained and available to scan messages either individually or in combination (at least two filters), the filtering or spam detection process can be generalized to new messages having slightly modified features (e.g., IP address). The invention also involves locating the appropriate IP addresses or URLs in a message as well as guiding filters to weigh origination or destination features more than text-based features.
-
Citations
51 Claims
-
1. A system that facilitates spam detection comprising:
-
a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message; and
a component that analyzes a subset of the extracted features in connection with building and employing a plurality of feature-specific filters that are independently trained to mitigate undue influence of at least one feature type over another in the message, the subset of extracted features comprising of at least one of a URL and an IP address, and the plurality of filters comprising at least a first feature-specific filter. - View Dependent Claims (2, 3, 4, 5, 8, 9, 10, 11, 14, 15, 16, 17, 18, 19, 46)
-
-
6. A system that facilitates spam detection comprising:
-
a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message;
at least one filter that is used when one of the IP address of the message or at least some part of at least one of the URLs in the message is unknown. - View Dependent Claims (7, 12, 13)
-
-
20. A system that facilitates spam detection comprising:
-
a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message;
at least one filter that is used when one of the IP address of the message or at least some part of at least one of the URLs in the message is known. - View Dependent Claims (21, 22)
-
-
23. A machine learning method that optimizes an objective function of the form
OBJECTIVE(MAXSCORE(m1), MAXSCORE(m2), . . . , MAXSCORE(mk), w1 . . . wn) where MAXSCORE(mk)=MAX(SCORE(IPk,1), SCORE(IPk,2), . . . , SCORE(IPk,k1)) where mk=messages; -
IPk,i represents the presence of some property(s) of mk; and
SCORE(IPk,i)=the sum of the weights of the features of IPk,i. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
-
-
31. A method that facilitates spam detection comprising:
-
providing a plurality of training data;
extracting a plurality of feature types from the training data, the feature types comprising at least one IP address, at least one URL and text-based features; and
training a plurality of feature-specific filters for the respective feature in an independent manner so that a first feature does not unduly influence a message score over a second feature type when determining whether a message is spam. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A data packet adapted to be transmitted between two or more computer processes facilitating improved detection of spam, the data packet comprising:
- information associated with training a plurality of feature-specific filters in an independent manner to mitigate undue influence between features and employing at least one feature specific filter comprising an IP address filter or a URL filter to determine whether a message is spam.
- 47. A spam detection system comprising a plurality of filters comprising at least one filter that is trained by using different smoothing for different spam features.
-
50. A method that facilitates spam detection comprising:
-
extracting data from a plurality of messages;
training at least one machine learning filter using at least a subset of the data, the training comprising employing a first smoothing for at least one of IP address or URL features and at least a second smoothing for other non-IP address or non-URL features. - View Dependent Claims (51)
-
Specification