Advanced URL and IP features
First Claim
Patent Images
1. A computer readable storage medium having stored thereon computer executable components that facilitate spam detection the components comprise:
- a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact, respond to, or act on the message, the features comprising at least one of IP address-based features and URL-based features, wherein the IP address-based features comprise at least one of presence of reverse DNS entry or domain name, hostname from the reverse DNS entry and missing reverse DNS entry;
an analysis component that analyzes at least a subset of the features; and
at least one filter that is trained on at least a subset of the features to facilitate distinguishing spam messages from good messages, wherein the filter is trained by analyzing at least a portion of the IP address-based data at least in part by taking null reverse DNS information and using a null RDNS entry as input into a machine learning algorithm.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are systems and methods that facilitate spam detection and prevention at least in part by building or training filters using advanced IP address and/or URL features in connection with machine learning techniques. A variety of advanced IP address related features can be generated from performing a reverse IP lookup. Similarly, many different advanced URL based features can be created from analyzing at least a portion of any one URL detected in a message.
247 Citations
36 Claims
-
1. A computer readable storage medium having stored thereon computer executable components that facilitate spam detection the components comprise:
-
a component that receives an item and extracts a set of features associated with an origination of a message or part thereof and/or information that enables an intended recipient to contact, respond to, or act on the message, the features comprising at least one of IP address-based features and URL-based features, wherein the IP address-based features comprise at least one of presence of reverse DNS entry or domain name, hostname from the reverse DNS entry and missing reverse DNS entry; an analysis component that analyzes at least a subset of the features; and at least one filter that is trained on at least a subset of the features to facilitate distinguishing spam messages from good messages, wherein the filter is trained by analyzing at least a portion of the IP address-based data at least in part by taking null reverse DNS information and using a null RDNS entry as input into a machine learning algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer implemented spam detection and filtering system comprising the following components executed on a processor:
-
a component that uses traceroute to gather additional IP address or URL feature information about at least one message; and a filtering component that employs the traceroute information to facilitate distinguishing between spam and good messages, wherein the filter is trained by analyzing at least a portion of the IP address-based data at least in part by taking null reverse DNS information and using a null RDNS entry as input into a machine learning algorithm. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer implemented spam detection and filtering system comprising the following components executed on a processor:
-
a component that receives an incoming message; and a filter that employs any combination of at least two of absolute URL features, count-based URL features, and combination-based URL features detected in a message to facilitate determining whether the message is spam. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A computer implemented spam detection and filtering system comprising the following components executed on a processor:
-
a component that receives an incoming message; a component that detects URLs and redirected URLs; and a machine learning filter that employs at least a portion of one or more redirected URLs detected in a message as inputs to facilitate determining whether the message is spam. - View Dependent Claims (22, 23, 24)
-
-
25. A computer implemented spam detection and filtering system comprising the following components executed on a processor:
-
a component that detects URLs in a message; a contact process component comprising at least one of the following contact routes;
URL detected in the message including at least one of an IP address of the URL, a DNS server of the URL, a traceroute of the IP address of the host of the URL, an IP address of the DNS server of the URL, version information of the DNS server, and the traceroute of the IP address of the DNS server; anda machine learning filter component that employs at least one of the contact routes to facilitate determining whether the message is spam, wherein the filter is trained by analyzing at least a portion of the IP address-based data at least in part by taking null reverse DNS information and using a null RDNS entry as input into a machine learning algorithm. - View Dependent Claims (26, 27)
-
-
28. A spam filtering method comprising:
-
extracting at least one of IP address-based data and URL-based data from a message, wherein the IP address-based data comprising at least a portion of an IP address and the URL-based data comprising at least a portion of at least one URL; generating at least one of IP address-based features and the URL-based features from the respective data to be used as inputs to at least one filter; and employing at least one filter trained on at least a subset of the inputs to facilitate distinguishing spam messages from good messages, wherein the filter is trained by analyzing at least a portion of the IP address-based data at least in part by taking null reverse DNS information and using a null RDNS entry as input into a machine learning algorithm. - View Dependent Claims (29, 30, 31, 32, 33)
-
-
34. A spam detection and filtering method comprising:
-
receiving incoming messages; examining a contact process of obtaining data from a URL to determine commonalities among a plurality of hostnames to facilitate generating features, wherein examining the contact process comprises at least one of; performing a DNS lookup for the URL, identifying identity of DNS server, obtaining traceroute of a path from the URL to the DNS server, identifying version information of DNS server, converting a hostname to an IP address using the DNS server, identifying at least a portion of the IP address and performing a traceroute on the IP address to determine whether the IP addresses are connected in a similar way; and employing at least one filter trained at least in part on at least a subset of the features to facilitate determining whether messages are spam.
-
-
35. A computer implemented spam filtering system comprising the following components executed on a processor:
-
means for extracting at least one of IP address-based data and URL-based data from a message, wherein the IP address-based data comprising at least a portion of an IP address and the URL-based data comprising at least a portion of at least one URL; means for generating at least one of IP address-based features and the URL-based features from the respective data to be used as inputs to at least one filter; and means for employing at least one filter trained on at least a subset of the inputs to facilitate distinguishing spam messages from good messages, wherein the filter is trained by analyzing at least a portion of the IP address-based data at least in part by taking null reverse DNS information and using a null RDNS entry as input into a machine learning algorithm.
-
-
36. A computer-readable storage medium containing a data structure adapted to be transmitted between two or more computer processes facilitating improved detection of spam, the data structure comprising:
- information associated with generating at least one of IP address-based features and the URL-based features from respective data to be used as inputs to at least one filter; and
employing at least one machine learning filter trained on at least a subset of the inputs to facilitate distinguishing spam messages from good messages.
- information associated with generating at least one of IP address-based features and the URL-based features from respective data to be used as inputs to at least one filter; and
Specification