Automatic rating and filtering of data files for objectionable content
First Claim
1. In a distributed computer system, a computer-implemented method for automatic rating a raw data file for objectionable content, wherein said raw data file is a hypermedia file, a text file, an audio file, or an image file, said method comprising the steps of:
- preprocessing said raw data file to create semantic units representative of semantic contents of said raw data file;
comparing said semantic units with a content rating repository comprising semantic entries and corresponding content ratings;
assigning content rating vectors to said semantic units based on said comparing step; and
creating a modified data file incorporating rating information derived from said content rating vectors, wherein when said raw data file is an audio file and said modified data file is a modified audio file, said preprocessing step further comprising the steps of;
using a voice recognition system to create text data from said audio file;
creating an audio-to-text correlation between a location in said text data and a corresponding location in said audio file; and
parsing said text data into said semantic units.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic method for rating data files for objectionable content in a distributed computer system includes preprocessing the file to create semantic units, comparing the semantic units with a rating repository containing entries and associated ratings, assigning content rating vectors to the semantic units, and creating a modified data file incorporating rating information derived from the content rating vectors. For text files, the semantic units are words or phrases, and the rating repository also contains words or phrases with corresponding content rating vectors. For audio files, the file is first converted to a text file using voice recognition software. For image files, image processing software is used to recognize individual objects and compare them to basic images and ratings stored in the rating repository. In one embodiment, a composite content rating vector is derived for the file from the individual content rating vectors, and the composite content rating vector is incorporated into the modified file. In an alternate embodiment, semantic units with content rating vectors exceeding preset user limit values of objectionable content are blocked out by display blocks or, for audio, audio blanking signals, for example, beeps. The user can then view or hear the remaining portions of the file. The invention can be used with any type of data file that can be divided into semantic units, and can be implemented in a server, client, search engine, or proxy server.
212 Citations
44 Claims
-
1. In a distributed computer system, a computer-implemented method for automatic rating a raw data file for objectionable content, wherein said raw data file is a hypermedia file, a text file, an audio file, or an image file, said method comprising the steps of:
-
preprocessing said raw data file to create semantic units representative of semantic contents of said raw data file;
comparing said semantic units with a content rating repository comprising semantic entries and corresponding content ratings;
assigning content rating vectors to said semantic units based on said comparing step; and
creating a modified data file incorporating rating information derived from said content rating vectors, wherein when said raw data file is an audio file and said modified data file is a modified audio file, said preprocessing step further comprising the steps of;
using a voice recognition system to create text data from said audio file;
creating an audio-to-text correlation between a location in said text data and a corresponding location in said audio file; and
parsing said text data into said semantic units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
deriving a composite content rating vector for said raw data file from said content rating vectors; and
combining said composite content rating vector with said raw data file to produce said modified data file.
-
-
3. The computer-implemented method of claim 2 wherein said composite content rating vector comprises a set of components, wherein each component in said set of components is derived from corresponding components of said content rating vectors.
-
4. The computer-implemented method of claim 3 wherein each component of said composite content rating vector is a weighted average of said corresponding components of said content rating vectors, said weighted average including weighting factors related to values of said corresponding components of said content rating vectors.
-
5. The computer-implemented method of claim 3 wherein each component of said composite content rating vector is equal to a selected value of said corresponding components of said content rating vectors, wherein said selected value is a highest of said corresponding components of said content rating vectors and said selected value has at least a predetermined minimum number of occurrences.
-
6. The computer-implemented method of claim 2 wherein said method occurs in a server.
-
7. The computer-implemented method of claim 2 wherein said raw data file is stored in a server and said method occurs in a proxy server.
-
8. The computer-implemented method of claim 2 wherein said raw data file is stored in a server and said method occurs in a client.
-
9. The computer-implemented method of claim 1 wherein said step of creating a modified data file comprises the steps of:
-
comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors; and
replacing objectionable content corresponding to the identified objectionable semantic units in a copy of said raw data file with display blocks to produce said modified data file.
-
-
10. The computer-implemented method of claim 9 wherein said raw data file is a file chosen from the group consisting of text, audio, and image.
-
11. The computer-implemented method of claim 9 wherein said raw data file is stored in a server and said method occurs in a client.
-
12. The computer-implemented method of claim 9 wherein said preset user limit values are stored in a client and said method occurs in a server.
-
13. The computer-implemented method of claim 9 wherein said preset user limit values are stored in a client, said raw data file is stored in a server, and said method occurs in a proxy server.
-
14. The computer-implemented method of claim 9, wherein said step of creating a modified data file further comprises the steps of:
-
deriving a modified composite content rating vector for said modified data file from a modified set of content rating vectors, wherein said modified set of content rating vectors does not contain content rating vectors corresponding to said objectionable semantic units; and
storing said modified composite content rating vector in said modified data file.
-
-
15. The computer-implemented method of claim 14 wherein said preset user limit values are stored in a client and said method occurs in a server.
-
16. The computer-implemented method of claim 14 wherein said preset user limit values are stored in a client, said raw data file is stored in a server, and said method occurs in a proxy server.
-
17. The computer-implemented method of claim 1 wherein said step of creating a modified audio file comprises the steps of:
-
comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors;
using said audio-to-text correlation to locate objectionable portions of said audio file corresponding to the identified objectionable semantic units; and
replacing said objectionable portions in a copy of said audio file with audio blanking signals to produce said modified audio file.
-
-
18. The computer-implemented method of claim 17 wherein said audio file is stored in a server and said method occurs in a client.
-
19. The computer-implemented method of claim 17 wherein said preset user limit values are stored in a client and said method occurs in a server.
-
20. The computer-implemented method of claim 17 wherein said preset user limit values are stored in a client, said audio file is stored in a server, and said method occurs in a proxy server.
-
21. The computer-implemented method of claim 1 wherein said raw data file is an image file, said modified data file is a modified image file, said semantic units are discrete objects in regions within said image file, and said preprocessing step is performed by an image processing system.
-
22. The computer-implemented method of claim 21 wherein said step of creating a modified image file comprises the steps of:
-
comparing said content rating vectors with preset user limit values to identify objectionable discrete objects, wherein said preset user limit values define objectionable content rating vectors; and
replacing objectionable content corresponding to the identified objectionable discrete objects in a copy of said image file with image blocks to produce said modified image file.
-
-
23. The computer-implemented method of claim 22 wherein said image file is stored in a server and said method occurs in a client.
-
24. The computer-implemented method of claim 22 wherein said preset user limit values are stored in a client and said method occurs in a server.
-
25. The computer-implemented method of claim 22 wherein said preset user limit values are stored in a client, said image file is stored in a server, and said method occurs in a proxy server.
-
26. A method for automatic rating and filtering in a network environment a raw data file for objectionable content, wherein said raw data file is a hypermedia file, a text file, an audio file, or an image file, said method comprising the steps of:
-
preprocessing said raw data file to create semantic units representative of semantic contents of said raw data file, wherein if said raw data file is an audio file said preprocessing step farther comprises the steps of;
using a voice recognition system to create text data from said audio file;
creating an audio-to-text correlation between a location in said text data and a corresponding location in said audio file; and
parsing said text data into said semantic units; and
whereinif said raw data file is an image file said semantic units are discrete objects in regions within said image file and said preprocessing step is performed by an image processing system;
comparing said semantic units with a content rating repository comprising semantic entries and corresponding content ratings;
assigning content rating vectors to said semantic units based on said comparing step; and
creating a modified data file incorporating rating information derived from said content rating vectors. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
deriving a composite content rating vector for said raw data file from said content rating vectors, wherein said composite content rating vector comprises a set of components each of which is derived from corresponding components of said content rating vectors; and
combining said composite content rating vector with said raw data file to produce said modified data file.
-
-
28. The method of claim 27, wherein each component of said composite content rating vector is a weighted average of said corresponding components of said content rating vectors, said weighted average including weighting factors related to values of said corresponding components of said content rating vectors.
-
29. The method of claim 27, wherein each component of said composite content rating vector is equal to a selected value of said corresponding components of said content rating vectors, and wherein said selected value is the highest of said corresponding components of said content rating vectors and said selected value has at least a predetermined minimum number of occurrences.
-
30. The method of claim 26, wherein said method occurs in a server.
-
31. The method of claim 26, wherein said raw data file is stored in a server and said method occurs in a proxy server.
-
32. The method of claim 26, wherein said raw data file is stored in a server and said method occurs in a client.
-
33. The method of claim 26, wherein said step of creating a modified data file comprises the steps of:
-
comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors; and
replacing objectionable content corresponding to the identified objectionable semantic units in a copy of said raw data file with display blocks to produce said modified data file.
-
-
34. The method of claim 33, wherein said preset user limit values are stored in a client and said method occurs in a server.
-
35. The method of claim 33, wherein said preset user limit values are stored in a client, said raw data file is stored in a server, and said method occurs in a proxy server.
-
36. The method of claim 33, wherein said step of creating a modified data file further comprises the steps of:
-
deriving a modified composite content rating vector for said modified data file from a modified set of content rating vectors, wherein said modified set of content rating vectors does not contain content rating vectors corresponding to said objectionable semantic units; and
storing said modified composite content rating vector in said modified data file.
-
-
37. The method of claim 26, wherein said raw data file is an audio file and said modified data file is a modified audio file, said step of creating a modified data file further comprises the steps of:
-
comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors;
using said audio-to-text correlation to locate objectionable portions of said audio file corresponding to the identified objectionable semantic units; and
replacing said objectionable portions in a copy of said audio file with audio blanking signals to produce said modified audio file.
-
-
38. The method of claim 37, wherein said audio file is stored in a server and said method occurs in a client.
-
39. The method of claim 37, wherein said preset user limit values are stored in a client and said method occurs in a server.
-
40. The method of claim 37, wherein said preset user limit values are stored in a client, said audio file is stored in a server, and said method occurs in a proxy server.
-
41. The method of claim 26, wherein said raw data file is an image file and said modified data file is a modified image file, said step of creating a modified data file further comprises the steps of:
-
comparing said content rating vectors with preset user limit values to identify objectionable discrete objects, wherein said preset user limit values define objectionable content rating vectors; and
replacing objectionable content corresponding to the identified objectionable discrete objects in a copy of said image file with image blocks to produce said modified image file.
-
-
42. The method of claim 41, wherein said image file is stored in a server and said method occurs in a client.
-
43. The method of claim 41, wherein said preset user limit values are stored in a client and said method occurs in a server.
-
44. The method of claim 41, wherein said preset user limit values are stored in a client, said image file is stored in a server, and said method occurs in a proxy server.
Specification