Methods and systems of classifying spam URL

US 9,378,465 B2
Filed: 04/29/2013
Issued: 06/28/2016
Est. Priority Date: 04/29/2013
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

identifying a feature dimension on a social networking system to detect anomalies, the feature dimension being a non-content feature dimension;

extracting URL chunks from content associated with a user action, wherein the user action records an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system;

maintaining a plurality of feature distributions respectively corresponding to a plurality of unique URL chunks identified in content of a plurality of user actions occurring on the social networking system, wherein each of the feature distributions represents an aggregation of non-content features along the identified feature dimension across the plurality of user actions for a unique URL chunk of the plurality of unique URL chunks;

aggregating a non-content feature of the user action along the identified feature dimension into a subset of the plurality of feature distributions respectively corresponding to the extracted URL chunks;

determining whether a feature distribution of a particular URL chunk from the plurality of feature distributions of the URL chunks exceeds an expectation threshold for the feature dimension, wherein the expectation threshold corresponds to a characterization of an expected distribution along the identified feature dimension; and

classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to the particular URL chunk on a social networking system.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of operation of a URL spam detection system includes: identifying a feature dimension of a user action on a social networking system to detect anomalies; extracting URL chunks from a content associated with the user action; aggregating a non-content feature of the user action along the feature dimension into a URL distribution store to produce a feature distribution for each of the URL chunks; determining whether the feature distribution of a particular URL chunk within the URL chunks exceeds an expectation threshold for the feature dimension; and classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to a particular URL chunk on a social networking system.

13 Citations

View as Search Results

21 Claims

1. A method, comprising:
- identifying a feature dimension on a social networking system to detect anomalies, the feature dimension being a non-content feature dimension;
  
  extracting URL chunks from content associated with a user action, wherein the user action records an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system;
  
  maintaining a plurality of feature distributions respectively corresponding to a plurality of unique URL chunks identified in content of a plurality of user actions occurring on the social networking system, wherein each of the feature distributions represents an aggregation of non-content features along the identified feature dimension across the plurality of user actions for a unique URL chunk of the plurality of unique URL chunks;
  
  aggregating a non-content feature of the user action along the identified feature dimension into a subset of the plurality of feature distributions respectively corresponding to the extracted URL chunks;
  
  determining whether a feature distribution of a particular URL chunk from the plurality of feature distributions of the URL chunks exceeds an expectation threshold for the feature dimension, wherein the expectation threshold corresponds to a characterization of an expected distribution along the identified feature dimension; and
  
  classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to the particular URL chunk on a social networking system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 21)
- - 2. The method of claim 1, wherein identifying the feature dimension includes identifying the feature dimension of one or more content sharing actions to disseminate content in the social networking system.
  - 3. The method of claim 1, wherein identifying the feature dimension includes identifying the feature dimension of one or more association actions of one or more user accounts to associate with content in the social networking system.
  - 4. The method of claim 1, wherein identifying the feature dimension includes identifying the feature dimension of one or more indirect association actions of one or more user accounts to associate with a social object affiliated with content in the social networking system.
  - 5. The method of claim 1, wherein aggregating the non-content feature includes aggregating within a time window wherein the feature distribution is a moving distribution along the feature dimension.
  - 6. The method of claim 1, further comprising determining the expectation threshold by machine learning against known reliable URL chunks and known spam URL chunks.
  - 7. The method of claim 1, further comprising determining the expectation threshold by machine learning against known spammer user accounts and known reliable user accounts.
  - 8. The method of claim 1, wherein the feature distribution is a binomial distribution of whether the non-content feature exists for the user action.
  - 9. The method of claim 1, wherein the feature distribution is a discrete distribution of enumerated states along the feature dimension.
  - 10. The method of claim 1, wherein the feature distribution is a continuous distribution along the feature dimension.
  - 11. The method of claim 1, wherein extracting the URL chunks includes extracting the URL chunks from an embedded URL and one or more redirects of the embedded URL, the URL chunks being one or more subsets of the embedded URL delimited by one or more punctuations.
  - 12. The method of claim 11, wherein classifying the particular URL chunk is based on classification of a related URL chunk in a sibling family tree of the particular URL chunk, the sibling family tree and the particular URL chunk sharing a parent domain URL chunk.
  - 21. The method of claim 1, wherein the expectation threshold corresponds to an expected range, expected mean, expected median, an expected mode, an expected variance, or any combination thereof, of the feature distribution.

13. A method, comprising:
- identifying a feature dimension on a social networking system to detect anomalies;
  
  extracting URL chunks from content associated with a user action, wherein the user action is an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system;
  
  aggregating a sender feature of the user action along the identified feature dimension into a plurality of feature distributions respectively corresponding the extracted URL chunks;
  
  detecting an anomaly in a feature distribution of a particular URL chunk, the feature distribution from the plurality of feature distributions of the extracted URL chunks, wherein said detecting includes comparing the feature distribution to an expected distribution along the feature dimension; and
  
  raising a suspicion level of the particular URL chunk when the anomaly is detected.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method of claim 13, wherein the expected distribution is a superset feature distribution of a superset URL chunk containing the particular URL chunk.
  - 15. The method of claim 13, wherein the expected distribution is a white list feature distribution of known reliable URL chunks.
  - 16. The method of claim 13, wherein raising the suspicion level includes raising the suspicion level when a pre-defined number of anomalies are detected along multiple feature dimensions.
  - 17. The method of claim 13, wherein raising the suspicion level includes classifying the particular URL chunk under a specific type of illegitimate sharing channel.
  - 18. The method of claim 13, wherein raising the suspicion level includes storing the suspicion level associated with the particular URL chunk in a classification table for a filter module restricting execution of the user action.
  - 19. The method of claim 13, further comprising:
    - tracking the feature distribution to determine whether the anomaly of the feature distribution subsides within an acceptable threshold range of the expected distribution; and
      
      lowering the suspicion level when the anomaly subsides.

20. A processor-based system, comprising:
- a feature collector module stored on a non-transitory memory, when executed by a processor is configured to;
  
  identify a feature dimension on a social networking system, the feature dimension being a non-content feature dimension;
  
  extract URL chunks from content associated with a user action, wherein the user action is an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system;
  
  aggregate a sender feature of the user action along the feature dimension into a plurality of feature distributions respectively corresponding to the extracted URL chunks, the plurality of feature distributions stored in a URL distribution store; and
  
  a URL classifier module stored on a non-transitory memory, when executed by a processor is coupled to the feature collection module via the URL distribution store and configured to;
  
  detect an anomaly in a feature distribution of a particular URL chunk, the feature distribution from the plurality of feature distributions of the extracted URL chunks, by comparing the feature distribution to an expected distribution; and
  
  raise a suspicion level of the particular URL chunk when the anomaly is detected.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Stewart, Allan, Zarakhovsky, Eugene, Palow, Christopher, Gowda, Chetan, Dorman, Brent
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Figueroa, Kevin W

Application Number

US13/872,811
Publication Number

US 20140324741A1
Time in Patent Office

1,156 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06N 20/00 Machine learning

Methods and systems of classifying spam URL

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

13 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems of classifying spam URL

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links