Method and Apparatus for Detecting Spam User Created Content

US 20090089279A1
Filed: 12/28/2007
Published: 04/02/2009
Est. Priority Date: 09/27/2007
Status: Active Grant

First Claim

Patent Images

1. A method for processing spam contents, comprising the steps of:

maintaining a plurality of key information databases;

receiving user-created content at least one of a service identifier (ID) and a content category ID of said user-created content from one or more users of a user-created content hosting site;

selecting at least one of the plurality of key information databases based on at least one of the received service ID and the received content category ID;

extracting second key information from the received user-created content;

searching the selected key information database to retrieve first key information related to the second key information;

classifying the user-created content as spam content based on the extracted second key information and/or the retrieved first key information related to the second key information; and

conditionally storing the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides methods, apparatuses and systems directed to automatically detecting spam user created content. In a particular implementation, there is provided a method for processing spam contents, which comprises: maintaining a plurality of key information databases; receiving user-created content and at least one of a service ID and a content category ID of the user-created content from one or more users of a user-created content hosting site; selecting one of the plurality of key information databases based on at least one of the service ID and the content category ID; extracting second key information from the received user-created content; searching the selected key information database for first key information related to the second key information; classifying the received user-created content as spam content based on the extracted second key information and/or the first key information related to the second key information; and conditionally storing the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content. Said first and second key information may comprise at least one of predetermined type(s) of data, word(s) and phrase(s) in said contents, wherein said data comprises a user ID, a universal resource locator, a site address, an account number and/or a telephone number. In addition, said method may further comprise: determining whether the extracted second key information corresponds to predefined restricted information; and if the extracted second key information corresponds to the predefined restricted information, removing the extracted second key information and/or replacing the extracted second key information with predefined different information.

46 Citations

View as Search Results

25 Claims

1. A method for processing spam contents, comprising the steps of:
- maintaining a plurality of key information databases;
  
  receiving user-created content at least one of a service identifier (ID) and a content category ID of said user-created content from one or more users of a user-created content hosting site;
  
  selecting at least one of the plurality of key information databases based on at least one of the received service ID and the received content category ID;
  
  extracting second key information from the received user-created content;
  
  searching the selected key information database to retrieve first key information related to the second key information;
  
  classifying the user-created content as spam content based on the extracted second key information and/or the retrieved first key information related to the second key information; and
  
  conditionally storing the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein said first and second key information comprise at least one of predetermined type(s) of data, word(s) and phrase(s) in said contents, and wherein said data comprises at least one of a user ID, a universal resource locator, a site address, an account number or a telephone number.
  - 3. The method of claim 2, wherein said maintaining comprises maintaining a spam information database, and wherein said classifying comprises classifying said user-created content as spam content based on whether the spam information database includes the extracted second key information.
  - 4. The method of claim 3, wherein said maintaining a spam information database comprises:
    - preparing a series of user-created contents, each of said contents classified as spam or non-spam content;
      
      obtaining, from the series of user-created contents, third key information and the frequency the third key information is found in the contents classified as spam content; and
      
      recording the third key information in the spam information database based on the obtained frequency.
  - 5. The method of claim 1, wherein said classifying comprises classifying said user-created content as spam content using a document classification algorithm.
  - 6. The method of claim 5, wherein said classifying comprises:
    - obtaining spam probability data for the extracted second key information from the key information database; and
      
      classifying the user-created content as spam content based on the obtained spam probability data.
  - 7. The method of claim 1, further comprising:
    - updating the key information database based on the extracted second key information.
  - 8. The method of claim 7, wherein said updating comprises registering the extracted second key information as the first key information in the key information database.
  - 9. The method of claim 7, wherein said updating comprises updating the frequency the extracted second key information is found in spam contents in accordance with the extracted second key information.
  - 10. The method of claim 6, wherein said updating comprises updating the spam probability data for the extracted second key information in accordance with the extracted second key information.
  - 11. The method of claim 1, further comprising:
    - determining whether the extracted second key information corresponds to predefined restricted information; and
      
      if the extracted key second information corresponds to the predefined restricted information, removing the extracted second key information or replacing the extracted second key information with predefined different information.

12. Logic encoded in one or more tangible media for execution and when executed operable to cause the one or more processors to:
- maintain a plurality of key information databases;
  
  receive user-created content at least one of a service identifier (ID) and a content category ID of said user-created content from one or more users of a user-created content hosting site;
  
  select at least one of the plurality of key information databases based on at least one of the received service ID and the received content category ID;
  
  extract second key information from the received user-created content;
  
  search the selected key information database to retrieve first key information related to the second key information;
  
  classify the user-created content as spam content based on the extracted second key information and/or the retrieved first key information related to the second key information; and
  
  conditionally store the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content.

13. An apparatus for processing spam contents, said apparatus comprising:
- a storage part configured to include a plurality of key information databases;
  
  a communication part configured to receive user-created content and at least one of a service ID and a content category ID from one or more of users of a user-created content hosting site; and
  
  a control part configured to select one of the plurality of key information databases based on at least one of the service ID and the content category ID, extract second key information from the received user-created content, search the selected key information database to retrieve first key information related to the extracted second key information, classify the received user-created content as spam content based on the extracted second key information and/or the retrieved first key information related to the first key information, and conditionally store the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The apparatus of claim 13, wherein said first and second key information comprise at least one of predetermined types of data, a word and a phrase of said contents, said data comprising at least one of a user ID, a URL (Universal Resource Locator), a site address, an account number and a telephone number.
  - 15. The apparatus of claim 13, wherein said control part comprises:
    - a key information management part configured to prepare a spam information database by classifying at least some of key information of the first information as spam key information; and
      
      a spam information matching part configured to classify said contents as spam contents based on whether the spam information database includes the extracted second key information.
  - 16. The apparatus of claim 15, wherein said key information management part prepares a series of user-created contents, each of said user-created content classified as spam or non-spam content;
    - obtains, from the series of user-created contents, third key information and the frequency the third key information is found in the user-created content classified as spam content; and
      
      records the third key information in the spam information database based on the obtained frequency.
  - 17. The apparatus of claim 13, wherein said control part comprises a document classifying part configured to classify said user-created content as spam content using a document classification algorithm.
  - 18. The apparatus of claim 17, wherein said document classifying part obtains spam probability data for the extracted second key information from the key information database and classifies the user-created content as spam content based on the obtained spam probability data.
  - 19. The apparatus of claim 13, wherein said control part comprises a key information management part configured to update the key information database based on the extracted second key information.
  - 20. The apparatus of claim 19, wherein said key information management part registers the extracted second key information as the first key information in the key information database.
  - 21. The apparatus of claim 19, wherein said key information management part updates the frequency the extracted second key information is found in spam content in accordance with the extracted second key information.
  - 22. The apparatus of claim 19, wherein said key information management part updates the spam probability data for the extracted second key information in accordance with the extracted second key information.
  - 23. The apparatus of claim 13, wherein said control part further comprises:
    - a contents management part configured to determine whether the extracted second key information corresponds to predefined restricted information and, if the extracted second key information corresponds to the predefined restricted information, removes the extracted second key information or replaces the extracted second key information with predefined different information.

24. A method for processing spam contents, comprising the steps of:
- maintaining a plurality of key information databases;
  
  receiving user-created content and at least one of a service ID and a content category ID of the user-created content from one or more users of a user-created content hosting site;
  
  selecting one of the plurality of key information databases based on at least one of the service ID and the content category ID;
  
  extracting second key information from the received user-created content;
  
  searching the selected key information database to retrieve first key information matching the second key information;
  
  if a match is found, classifying the user-created content as spam content; and
  
  conditionally storing the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content.

25. An apparatus for processing spam contents, said apparatus comprising:
- a storage part configured to include a key information database;
  
  a communication part configured to receive user-created content and at least one of a service ID and a content category ID of the user-created content from one or more users of a user-created content hosting site; and
  
  a control part configured to select one of the plurality of key information databases based on at least one of the service ID and the content category ID, extract second key information from the received user-created content, search the selected key information database to retrieve first key information matching the second key information, if a match is found, classify the user-created content as spam content and conditionally store the user-created content in a network accessible data store available to users of the user-created content hosting site based on classifying the user-created content as spam or non-spam content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Kim, Hyung Deok, Jeong, Ho Wook, Kang, Jae Ho

Granted Patent

US 8,095,547 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/353   into predefined classes

G06F 16/958   Organisation or management ...

G06Q 30/0601   Electronic shopping [e-shop...

H04L 51/212   using filtering or selectiv...

Method and Apparatus for Detecting Spam User Created Content

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

46 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method and Apparatus for Detecting Spam User Created Content

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

46 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links