Training filters for detecting spasm based on IP addresses and text-related features
First Claim
1. A machine-implemented system that facilitates spam detection comprising a processor executing:
- a feature extraction component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message;
a feature analysis component that analyzes a subset of the extracted features in connection with building and employing a plurality of feature-specific filters that are independently trained to mitigate undue influence of at least one feature type over another in the message, the subset of extracted features comprising of at least one of a Uniform Resource Locator (URL) and an Internet Protocol (IP) address, and the plurality of feature-specific filters comprising at least a first feature-specific filter; and
a machine learning component that determines last IP address external to the recipient'"'"'s system via a machine learning technique to facilitate spam detection, the machine learning component employs MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully qualified domain which corresponds to an entry in the domain'"'"'s MX record and determines whether the IP address is external or internal by verifying if the IP address is in a form characteristic to internal IP addresses and performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject invention provides for an intelligent quarantining system and method that facilitates detecting and preventing spam. In particular, the invention employs a machine learning filter specifically trained using origination features such as an IP address as well as destination feature such as a URL. Moreover, the system and method involve training a plurality of filters using specific feature data for each filter. The filters are trained independently each other, thus one feature may not unduly influence another feature in determining whether a message is spam. Because multiple filters are trained and available to scan messages either individually or in combination (at least two filters), the filtering or spam detection process can be generalized to new messages having slightly modified features (e.g., IP address). The invention also involves locating the appropriate IP addresses or URLs in a message as well as guiding filters to weigh origination or destination features more than text-based features.
260 Citations
44 Claims
-
1. A machine-implemented system that facilitates spam detection comprising a processor executing:
-
a feature extraction component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message; a feature analysis component that analyzes a subset of the extracted features in connection with building and employing a plurality of feature-specific filters that are independently trained to mitigate undue influence of at least one feature type over another in the message, the subset of extracted features comprising of at least one of a Uniform Resource Locator (URL) and an Internet Protocol (IP) address, and the plurality of feature-specific filters comprising at least a first feature-specific filter; and a machine learning component that determines last IP address external to the recipient'"'"'s system via a machine learning technique to facilitate spam detection, the machine learning component employs MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully qualified domain which corresponds to an entry in the domain'"'"'s MX record and determines whether the IP address is external or internal by verifying if the IP address is in a form characteristic to internal IP addresses and performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A machine-implemented system that facilitates spam detection comprising a processor executing:
-
a feature extraction component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message; at least one filter that is used when one of an Internet Protocol (IP) address of the message or at least some part of at least one of Uniform Resource Locator (URL) in the message is unknown; and a machine learning component that determines last IP address external to the recipient'"'"'s system via a machine learning technique to facilitate spam detection, the machine learning component employs MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully Qualified domain which corresponds to an entry in the domain'"'"'s MX record and determines whether the IP address is external or internal by verifying if the IP address is in a form characteristic to internal IP addresses and performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name. - View Dependent Claims (15, 16, 17)
-
-
18. A machine-implemented system that facilitates spam detection comprising a processor executing:
-
a feature extraction component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact or respond to the message; at least one filter that is used when one of the Internet Protocol (IP) address of the message or at least some part of at least one of the Uniform Resource Locators (URLs) in the message is known; and a machine learning component that determines last IP address external to the recipient'"'"'s system via a machine learning technique to facilitate spam detection, the machine learning component employs MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully Qualified domain which corresponds to an entry in the domain'"'"'s MX record and determines whether the IP address is external or internal by verifying if the IP address is in a form characteristic to internal IP addresses and performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name. - View Dependent Claims (19, 20)
-
-
21. A machine learning method implemented on a machine that facilitates spam detection by optimizing an objective function of the form
OBJECTIVE(MAXSCORE(m1), MAXSCORE(m2), . . . , MAXSCORE(mk), w1 . . . wn) where MAXSCORE(mk) =MAX(SCORE(IPk,1), SCORE(IPk,2), . . . , SCORE(IPk,kI)) where mk =messages; IPk,i represents the IP addresses of mk; SCORE(IPk,i)=the sum of the weights of the IPk,i, and wherein the machine learning method optimizes the weights associated with one feature at any given time and maximizes accuracy on a training data to facilitate improved detection of spam, the machine learning method employs MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully qualified domain which corresponds to an entry in the domain'"'"'s MX record and determines whether the IP address is external or internal by verifying if the IP address is in a form characteristic to internal IP addresses and performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name and further determines last IP address external to a recipient'"'"'s system to facilitate spam detection. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
29. A machine-implemented method that facilitates spam detection comprising:
-
providing a plurality of training data; extracting a plurality of feature types from the training data, the feature types comprising at least one Internet Protocol (IP) address, at least one Uniform Resource Locator (URL) and text-based features; training a plurality of feature-specific filters for the respective feature in an independent manner so that a first feature does not unduly influence a message score over a second feature type when determining whether a message is spam; employing MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully qualified domain which corresponds to an entry in the domain'"'"'s MX record; verifying if the IP address is in a form characteristic to internal IP addresses; performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name; and determining last IP address external to the recipient'"'"'s system to facilitate spam detection. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. A machine-implemented method that facilitates spam detection comprising:
-
extracting data from a plurality of messages; training at least one machine learning filter using at least a subset of the data, the training comprising employing a first smoothing for at least one of Internet Protocol (IP) address or Uniform Resource Locator (URL) features and at least a second smoothing for other non-IP address or non-URL features; employing MX records to determine a true source of a message by way of tracing back through a received from list until an IP address is found that corresponds to a fully Qualified domain which corresponds to an entry in the domain'"'"'s MX record; verifying that the IP address is in a form characteristic to internal IP addresses; performing at least one of an IP address lookup and a reverse IP address lookup to ascertain whether the IP address correlates with a sender'"'"'s domain name; and determining last IP address external to the recipient'"'"'s system to facilitate spam detection. - View Dependent Claims (44)
-
Specification