Multimedia content filtering

US 8,626,930 B2
Filed: 03/15/2007
Issued: 01/07/2014
Est. Priority Date: 03/15/2007
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

analyzing, by one or more processing devices, a web page content for predetermined parameters, wherein at least one of the predetermined parameters is based on an image media content;

generating a tag that encapsulates the at least one predetermined parameter;

processing the web page content to provide text representing the web page Content;

inserting the tag into the text to provide tokens;

inputting the tokens into a latent semantic mapping (LSM) filter;

mapping the tokens into a vector space of the latent semantic mapping filter;

analyzing, by the one or more processing devices, the web page content using the latent semantic mapping filter wherein the vector space of the latent semantic mapping filter includes a first plurality of vectors at a first location and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and at least one input into the latent semantic mapping filter comprises one or more representations of the web page content that are mapped to a third location in the vector space;

determining, by the one or more processing devices, distances betweenthe third location and the first location, andthe third location and the second location; and

filtering, by the one or more processing devices, the web page content based on the determined distances.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatuses to filter multimedia content are described. The multimedia content in one embodiment is analyzed for one or more parameters. The multimedia content in one embodiment is filtered based on the one or more parameters using a latent semantic mapping (“LSM”) filter. In one embodiment, the one or more parameters include information about a structure of the multimedia content. A tag that encapsulates the one or more parameters may be generated. Then, the tag is input into the latent semantic mapping filter. In one embodiment, the LSM filter is trained to recognize the multimedia content based on the one or more parameters. In one embodiment, more than two categories are provided for a multimedia content. The multimedia content is classified in more than two categories using the LSM filter. The multimedia content may be blocked based on the classifying.

Citations

53 Claims

1. A method, comprising:
- analyzing, by one or more processing devices, a web page content for predetermined parameters, wherein at least one of the predetermined parameters is based on an image media content;
  
  generating a tag that encapsulates the at least one predetermined parameter;
  
  processing the web page content to provide text representing the web page Content;
  
  inserting the tag into the text to provide tokens;
  
  inputting the tokens into a latent semantic mapping (LSM) filter;
  
  mapping the tokens into a vector space of the latent semantic mapping filter;
  
  analyzing, by the one or more processing devices, the web page content using the latent semantic mapping filter wherein the vector space of the latent semantic mapping filter includes a first plurality of vectors at a first location and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and at least one input into the latent semantic mapping filter comprises one or more representations of the web page content that are mapped to a third location in the vector space;
  
  determining, by the one or more processing devices, distances betweenthe third location and the first location, andthe third location and the second location; and
  
  filtering, by the one or more processing devices, the web page content based on the determined distances.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the predetermined parameters include information about a structure of the web page content.
  - 3. The method of claim 1, wherein the predetermined parameters include information relating to a number of references.
  - 4. The method of claim 1, wherein the predetermined parameters include information relating to a number of images.
  - 5. The method of claim 1, wherein the predetermined parameters include information relating to a length of the web page content.
  - 6. The method of claim 1, wherein the predetermined parameters includes a textual pattern.
  - 7. The method of claim 1, further comprising rating the web page content based on the predetermined parameters.
  - 8. The method of claim 1, wherein the web page content includes an executable script from which text is extracted in processing the web page content.
  - 9. The method of claim 1, further comprisingtraining the LSM filter to recognize the web page content based on the predetermined parameters.

10. A method, comprising:
- analyzing a multimedia content for one or more parameters;
  
  generating one or more tags associated with the one or more parameters;
  
  providing the one or more tags to a latent semantic (LSM) filter;
  
  providing, by one or more processing devices, a vector space having at least two categories for the multimedia content, the vector space comprising a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of a new multimedia content are mapped to a third location in the vector space;
  
  determining by the one or more processing devices distances between the third location and the first location, and the third location and the second location; and
  
  classifying, by the one or more processing devices, a new multimedia content based on the distances determined.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The method of claim 10, wherein the at least two categories is any combination of an explicit content category and a first plurality of legitimate content categories, a legitimate content category and a second plurality of explicit content categories, or a third plurality of explicit content categories and a fourth plurality of legitimate content categories.
  - 12. The method of claim 10, further comprisingblocking the multimedia content based on the classifying.
  - 13. The method of claim 10, further comprisingstoring a reference to the multimedia content.
  - 14. The method of claim 10, further comprisingaccepting the multimedia content;
    - andadding the accepted multimedia content to a list to train the latent semantic filter.

15. A method to classify a multimedia content, comprising:
- processing, by one or more processing devices, the multimedia content toprovide text for an analysis of the multimedia content;
  
  analyzing, by the one or more processing devices, the multimedia content for predetermined parameters, wherein at least one of the predetermined parameters is based on image media content;
  
  generating, by the one or more processing devices, a tag that encapsulates at least one of the predetermined parameters;
  
  associating, by the one or more processing devices, the tag with the text to provide one or more tokens; and
  
  mapping, by the one or more processing devices, the one or more tokens into a vector space containing a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein the one or more tokens are mapped into a third location in the vector space;
  
  determining by the one or more processing devices distances between the third location and the first location, and the third location and the second location; and
  
  determining, by the one or more processing devices, whether to filter the multimedia content based on the distances.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15, wherein the tag includes a string of characters.
  - 17. The method of claim 15, wherein the processing includesextracting strings within an executable script.
  - 18. The method of claim 15, further includingremoving one or more stop words from the text.
  - 19. The method of claim 15, further comprisingfiltering the multimedia content based on the mapping.

20. An article of manufacture comprising:
- a non-transitory machine-accessible storage medium storing data that, when accessed by a machine, cause the machine to perform operations comprising;
  
  analyzing a web page content for predetermined parameters, wherein at least one of the predetermined parameters is based on an image media content;
  
  generating a tag that encapsulates the at least one predetermined parameter;
  
  processing the web page content to provide text representing the web page content;
  
  inserting the tag into the text to provide tokens;
  
  inputting the tokens into a latent semantic mapping (LSM) filter;
  
  mapping the tokens into a vector space of the latent semantic mapping filter;
  
  analyzing the web page content using a latent semantic mapping filter, wherein the latent semantic mapping filter comprises a vector space containing a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, andwherein at least one input into the latent semantic mapping filter comprises one or more representations of the web page content that are mapped to a third location in a vector space;
  
  determining distances between the third location and the first location, and the third location and the second location; and
  
  determining whether to filter the web page content based on the distances.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28)
- - 21. The article of manufacture of claim 20, wherein the predetermined parameters include information about a structure of the web page content.
  - 22. The article of manufacture of claim 20, wherein the predetermined parameters include information relating to a number of references.
  - 23. The article of manufacture of claim 20, wherein the predetermined parameters include information relating to a number of images.
  - 24. The article of manufacture of claim 20, wherein the predetermined parameters include information relating to a length of the web page content.
  - 25. The article of manufacture of claim 20, wherein the predetermined parameters includes a textual pattern.
  - 26. The article of manufacture of claim 20, wherein the machine-accessible medium further includes data that cause the machine to perform operations comprising,rating the web page content based on the predetermined parameters.
  - 27. The article of manufacture of claim 20, wherein the web page content includes an executable script from which text is extracted in processing the web page content.
  - 28. The article of manufacture of claim 20, wherein the machine-accessible medium further includes data that cause the machine to perform operations comprising,training the LSM filter to recognize the web page content based on the predetermined parameters.

29. An article of manufacture comprising:
- a non-transitory machine-accessible storage medium storing data that, when accessed by a machine, cause the machine to perform operations comprising;
  
  analyzing a multimedia content for one or more parameters;
  
  generating one or more tags associated with the one or more parameters;
  
  providing the one or more tags to a latent semantic (LSM) filter;
  
  providing a vector space having at least two categories for the multimedia content, the vector space comprising a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of a new multimedia content are mapped to a third location in the vector space;
  
  determining distances between the third location and the first location, and the third location and the second location; and
  
  classifying a new multimedia content based on the distances determined.
- View Dependent Claims (30, 31, 32, 33)
- - 30. The article of manufacture of claim 29, wherein the at least two categories is any combination of an explicit content category and a first plurality of legitimate content categories, a legitimate content category and a second plurality of explicit content categories, or a third plurality of explicit content categories and a fourth plurality of legitimate content categories.
  - 31. The article of manufacture of claim 29, wherein the machine-accessible medium further includes data that cause the machine to perform operations comprising,blocking the multimedia content based on the classifying.
  - 32. The article of manufacture of claim 29, further comprisingstoring a reference to the multimedia content.
  - 33. The article of manufacture of claim 29, wherein the machine-accessible medium further includes data that cause the machine to perform operations comprising,accepting the multimedia content;
    - andadding the accepted multimedia content to a list to train the latent semantic filter.

34. An article of manufacture comprising:
- a non-transitory machine-accessible storage medium storing data that, when accessed by a machine, cause the machine to perform operations to classify a multimedia content, comprising;
  
  processing the multimedia content to provide text for an analysis of the multimedia content;
  
  analyzing the multimedia content for predetermined parameters, wherein at least one parameter is based on image media content;
  
  generating a tag that encapsulates at least one of the predetermined parameters;
  
  associating the tag with the text to provide one or more tokens; and
  
  mapping the one or more tokens into a vector space containing a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein the one or more tokens are mapped into a third location in the vector space;
  
  determining distances between the third location and the first location, and the third location and the second location; and
  
  determining whether to filter the multimedia content based on the distances.
- View Dependent Claims (35, 36, 37, 38)
- - 35. The article of manufacture of claim 34, wherein the tag includes a string of characters.
  - 36. The article of manufacture of claim 34, wherein the processing includes extracting strings within an executable script.
  - 37. The article of manufacture of claim 34, wherein the machine-accessible medium further includes data that cause the machine to perform operations comprising,removing one or more stop words from the text.
  - 38. The article of manufacture of claim 34, wherein the machine-accessible medium further includes data that cause the machine to perform operations comprising,filtering the multimedia content based on the mapping.

39. A computer system, comprising:
- a bus;
  
  a data storage device coupled to the bus;
  
  one or more processing devices coupled to the data storage device, wherein the data storage device stores instructions executed by the one or more processing devices to perform operations, comprising;
  
  analyzing a web page content for predetermined parameters, wherein at least one parameter is based on an image media content;
  
  generating a tag that encapsulates the at least one parameter;
  
  processing the web page content to provide text for an analysis of the web page content;
  
  inserting the tag into the text to provide tokens;
  
  mapping the tokens into a vector space of a latent semantic mapping filter;
  
  analyzing the web page content using the latent semantic mapping filter, wherein the latent semantic mapping filter comprises a vector space containing a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein at least one input into the latent semantic mapping filter comprises one or more representations of the image media content that are mapped to a third location in the vector space;
  
  determining distances between the third location and the first location, and the third location and the second location; and
  
  determining whether to filter the web page content based the distances.
- View Dependent Claims (40, 41)
- - 40. The system of claim 39, wherein the web page content includes an executable script from which text is extracted in processing the web page content.
  - 41. The system of claim 39, wherein the one or more processing devices further performing operations, comprising:
    - training the LSM filter to recognize the web page content based on the predetermined parameters.

42. A computer system, comprising:
- a bus;
  
  a data storage device coupled to the bus;
  
  one or more processing devices coupled to the storage device, wherein the storage device stores instructions executed by the one or more processing devices to perform operations, comprising;
  
  analyzing a multimedia content for one or more parameters;
  
  generating one or more tags associated with the one or more parameters;
  
  providing the one or more tags to a latent semantic (LSM) filter;
  
  providing a vector space having at least two categories for the multimedia content, the vector space comprising a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of a new multimedia content are mapped to a third location in the vector space;
  
  determining distances between the third location and the first location, and the third location and the second location; and
  
  classifying a new multimedia content based on distances determined.
- View Dependent Claims (43, 44)
- - 43. The system of claim 42, wherein the at least two categories is any combination of an explicit content category and a first plurality of legitimate content categories, a legitimate content category and a second plurality of explicit content categories, or a third plurality of explicit content categories and a fourth plurality of legitimate content categories.
  - 44. The system of claim 43, wherein the one or more processing devices further performing operations, comprising:
    - blocking the multimedia content based on the classifying.

45. A computer system to classify a multimedia content, comprising:
- a bus;
  
  a data storage device coupled to the bus;
  
  one or more processing devices coupled to the data storage device, wherein the data storage device stores instructions executed by the one or more processing devices to perform operations, comprising;
  
  processing the multimedia content to provide text for analysis of the multimedia content;
  
  analyzing the multimedia content for predetermined parameters, wherein at least one of the predetermined parameters is based on image media content;
  
  generating a tag that encapsulates the at least one of the predetermined parameters;
  
  associating the tag with the text to provide one or more tokens; and
  
  mapping the one or more tokens into a vector space containing a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein the one or more tokens are mapped into a third location in the vector space;
  
  determining distances between the third location and the first location, and the third location and the second location; and
  
  determining whether to filter the multimedia content based on the distances.
- View Dependent Claims (46, 47)
- - 46. The system of claim 45, wherein the processing includesextracting strings within an executable script.
  - 47. The system of claim 45, wherein the processing includesremoving one or more stop words from the text.

48. A method, comprising:
- processing, by one or more processing devices, multimedia content to provide processed content for use by a latent semantic mapping filter;
  
  analyzing the multimedia content for predetermined parameters;
  
  generating a tag that encapsulates at least one of the predetermined parameters;
  
  processing the multimedia content to provide text representing the multimedia content;
  
  inserting the tag into the text to provide tokens;
  
  inputting the tokens into the latent semantic mapping filter;
  
  mapping the tokens into a vector space of the latent semantic mapping filter;
  
  classifying, by the one or more processing devices, the processed content with the latent semantic mapping filter including the vector space, the vector space having at least two categories for the multimedia content, the vector space comprising a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of the multimedia content are mapped to a third location in the vector space and wherein distances between the third location and the first location, and the third location and the second location; and
  
  determining, by the one or more processing devices, whether to filter the multimedia content based upon the classifying.

49. An article of manufacture, comprising:
- a non-transitory machine-readable storage medium storing executable program instructions which when executed by a data processing system cause the system to perform operations comprising;
  
  processing multimedia content to provide processed content for use by a latent semantic mapping filter;
  
  analyzing the multimedia content for predetermined parameters;
  
  generating a tag that encapsulates at least one of the predetermined parameters;
  
  processing the multimedia content to provide text representing the multimedia content;
  
  inserting the tag into the text to provide tokens;
  
  inputting the tokens into the latent semantic mapping filter;
  
  mapping the tokens into a vector space of the latent semantic mapping filter;
  
  classifying the processed content with the latent semantic mapping filter including the vector space, the vector space having at least two categories for the multimedia content, the vector space comprising a first plurality of vectors at a first location, and a second plurality of vectors at a second location, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of the multimedia content are mapped to a third location in the vector space and wherein distances between the third location and the first location, and the third location and the second location are determined; and
  
  determining whether to filter the multimedia content based upon the classifying.

50. A method, comprising:
- presenting a user interface on a display device;
  
  receiving, by one or more processing devices, input from the user interface;
  
  placing, by the one or more processing devices, a latent semantic mapping filter of a web page content to operate in a training mode to recognize which kind of multimedia content to filter out in response to the input from the user interface;
  
  analyzing, by the one or more processing devices, the web page content for predetermined parameters;
  
  generating a tag that encapsulates at least one predetermined parameter;
  
  processing the web page content to provide text representing the web page content;
  
  inserting the tag into the text to provide tokens;
  
  inputting the tokens into the latent semantic mapping filter; and
  
  mapping the tokens into a vector space of the latent semantic mapping filter;
  
  wherein the latent semantic mapping filter includes a first plurality of vectors at a first location in a vector space, and a second plurality of vectors at a second location in the vector space, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of the web page content are mapped to a third location in the vector space, and wherein distances between the third location and the first location, and the third location and the second location are determined to recognize the multimedia content.
- View Dependent Claims (51, 52)
- - 51. The method as in claim 50, wherein the presenting is in response to receiving data relating to a web page.
  - 52. The method as in claim 51, wherein the training mode comprises modifying parameters of the latent semantic mapping filter.

53. An article of manufacture, comprising:
- a non-transitory machine-readable storage medium storing executable program instructions which when executed by a data processing system cause the system to perform operations comprising;
  
  presenting a user interface;
  
  receiving input from the user interface; and
  
  placing a latent semantic mapping filter of a web page content to operate in a training mode to recognize which kind of multimedia content to filter out in response to the input from the user interface;
  
  analyzing the web page content for predetermined parameters;
  
  generating a tag that encapsulates at least one predetermined parameter;
  
  processing the web page content to provide text representing the web page content;
  
  inserting the tag into the text to provide tokens;
  
  inputting the tokens into the latent semantic mapping filter; and
  
  mapping the tokens into a vector space of the latent semantic mapping filter;
  
  wherein the latent semantic mapping filter includes a first plurality of vectors at a first location in a vector space, and a second plurality of vectors at a second location in the vector space, wherein the first location comprises materials related to predefined legitimate multimedia content, the second location comprises materials related to predefined explicit multimedia content, and wherein one or more representations of the web page content are mapped to a third location in the vector space, and wherein distances between the third location and the first location, and the third location and the second location are determined to recognize the multimedia content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Bellegarda, Jerome, Ko, Steve, Scalo, John, Donelli, Giovanni
Primary Examiner(s)
Shiferaw, Eleni
Assistant Examiner(s)
VU, PHY ANH TRAN

Application Number

US11/724,880
Publication Number

US 20080228928A1
Time in Patent Office

2,490 Days
Field of Search

709224-225, 709228-229, 726 23- 30, 713153-154
US Class Current

709/228
CPC Class Codes

G06F 16/435   Filtering based on addition...

H04L 65/75   Media network packet handling

H04L 67/56   Provisioning of proxy servi...

H04L 67/564   Enhancement of application ...

H04L 67/5651   Reducing the amount or size...

Multimedia content filtering

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

53 Claims

Specification

Solutions

Use Cases

Quick Links

Multimedia content filtering

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

53 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links