Methods and systems of classifying spam URL
First Claim
1. A method, comprising:
- identifying a feature dimension on a social networking system to detect anomalies, the feature dimension being a non-content feature dimension;
extracting URL chunks from content associated with a user action, wherein the user action records an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system;
maintaining a plurality of feature distributions respectively corresponding to a plurality of unique URL chunks identified in content of a plurality of user actions occurring on the social networking system, wherein each of the feature distributions represents an aggregation of non-content features along the identified feature dimension across the plurality of user actions for a unique URL chunk of the plurality of unique URL chunks;
aggregating a non-content feature of the user action along the identified feature dimension into a subset of the plurality of feature distributions respectively corresponding to the extracted URL chunks;
determining whether a feature distribution of a particular URL chunk from the plurality of feature distributions of the URL chunks exceeds an expectation threshold for the feature dimension, wherein the expectation threshold corresponds to a characterization of an expected distribution along the identified feature dimension; and
classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to the particular URL chunk on a social networking system.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of operation of a URL spam detection system includes: identifying a feature dimension of a user action on a social networking system to detect anomalies; extracting URL chunks from a content associated with the user action; aggregating a non-content feature of the user action along the feature dimension into a URL distribution store to produce a feature distribution for each of the URL chunks; determining whether the feature distribution of a particular URL chunk within the URL chunks exceeds an expectation threshold for the feature dimension; and classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to a particular URL chunk on a social networking system.
13 Citations
21 Claims
-
1. A method, comprising:
-
identifying a feature dimension on a social networking system to detect anomalies, the feature dimension being a non-content feature dimension; extracting URL chunks from content associated with a user action, wherein the user action records an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system; maintaining a plurality of feature distributions respectively corresponding to a plurality of unique URL chunks identified in content of a plurality of user actions occurring on the social networking system, wherein each of the feature distributions represents an aggregation of non-content features along the identified feature dimension across the plurality of user actions for a unique URL chunk of the plurality of unique URL chunks; aggregating a non-content feature of the user action along the identified feature dimension into a subset of the plurality of feature distributions respectively corresponding to the extracted URL chunks; determining whether a feature distribution of a particular URL chunk from the plurality of feature distributions of the URL chunks exceeds an expectation threshold for the feature dimension, wherein the expectation threshold corresponds to a characterization of an expected distribution along the identified feature dimension; and classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to the particular URL chunk on a social networking system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 21)
-
-
13. A method, comprising:
-
identifying a feature dimension on a social networking system to detect anomalies; extracting URL chunks from content associated with a user action, wherein the user action is an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system; aggregating a sender feature of the user action along the identified feature dimension into a plurality of feature distributions respectively corresponding the extracted URL chunks; detecting an anomaly in a feature distribution of a particular URL chunk, the feature distribution from the plurality of feature distributions of the extracted URL chunks, wherein said detecting includes comparing the feature distribution to an expected distribution along the feature dimension; and raising a suspicion level of the particular URL chunk when the anomaly is detected. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A processor-based system, comprising:
-
a feature collector module stored on a non-transitory memory, when executed by a processor is configured to; identify a feature dimension on a social networking system, the feature dimension being a non-content feature dimension; extract URL chunks from content associated with a user action, wherein the user action is an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system; aggregate a sender feature of the user action along the feature dimension into a plurality of feature distributions respectively corresponding to the extracted URL chunks, the plurality of feature distributions stored in a URL distribution store; and a URL classifier module stored on a non-transitory memory, when executed by a processor is coupled to the feature collection module via the URL distribution store and configured to; detect an anomaly in a feature distribution of a particular URL chunk, the feature distribution from the plurality of feature distributions of the extracted URL chunks, by comparing the feature distribution to an expected distribution; and raise a suspicion level of the particular URL chunk when the anomaly is detected.
-
Specification