Spam detection for user-generated multimedia items based on keyword stuffing
First Claim
Patent Images
1. A computer-implemented method for spam detection in a collection of multimedia items, comprising:
- storing the collection of multimedia items in a memory of a computer system, each multimedia item including a description of the item, the description including a plurality of tokens; and
for at least one multimedia item;
selecting a plurality of portions of the description of the item and for each selected portion counting a total number of unique tokens in the selected portion, wherein unique tokens appearing more than once in the selected portion are counted only once;
determining a distribution of unique tokens for the multimedia item using the total number of unique tokens in each selected portion of the multimedia item; and
responsive to the distribution of unique tokens exceeding a distribution threshold, marking the multimedia item for a future spam filtering action.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, a method, and various software tools enable a video hosting website to automatically identify posted video items that contain spam in the metadata associated with a respective video item. A spam detection tool for user-generated video items based on keyword stuffing is provided that facilitates the detection of spam in the metadata associated with a video item.
34 Citations
14 Claims
-
1. A computer-implemented method for spam detection in a collection of multimedia items, comprising:
-
storing the collection of multimedia items in a memory of a computer system, each multimedia item including a description of the item, the description including a plurality of tokens; and for at least one multimedia item; selecting a plurality of portions of the description of the item and for each selected portion counting a total number of unique tokens in the selected portion, wherein unique tokens appearing more than once in the selected portion are counted only once; determining a distribution of unique tokens for the multimedia item using the total number of unique tokens in each selected portion of the multimedia item; and responsive to the distribution of unique tokens exceeding a distribution threshold, marking the multimedia item for a future spam filtering action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13)
-
-
12. A non-transitory computer readable storage medium containing program code for spam detection in a collection of multimedia items, the program code comprising code for:
-
storing the collection of multimedia items, each multimedia item including a description of the item, the description including a plurality of tokens; and for at least one multimedia item; selecting a plurality of portions of the description of the item and for each selected portion counting a total number of unique tokens in the selected portion, wherein unique tokens appearing more than once in the selected portion are counted only once; determining a distribution of unique tokens for the multimedia item using the total number of unique tokens in each selected portion of the multimedia item; and responsive to the distribution of unique tokens exceeding a distribution threshold, marking the multimedia item for a future spam filtering action. - View Dependent Claims (14)
-
Specification