Automatic rating and filtering of data files for objectionable content

US 6,493,744 B1
Filed: 08/16/1999
Issued: 12/10/2002
Est. Priority Date: 08/16/1999
Status: Expired due to Fees

First Claim

Patent Images

1. In a distributed computer system, a computer-implemented method for automatic rating a raw data file for objectionable content, wherein said raw data file is a hypermedia file, a text file, an audio file, or an image file, said method comprising the steps of:

preprocessing said raw data file to create semantic units representative of semantic contents of said raw data file;

comparing said semantic units with a content rating repository comprising semantic entries and corresponding content ratings;

assigning content rating vectors to said semantic units based on said comparing step; and

creating a modified data file incorporating rating information derived from said content rating vectors, wherein when said raw data file is an audio file and said modified data file is a modified audio file, said preprocessing step further comprising the steps of;

using a voice recognition system to create text data from said audio file;

creating an audio-to-text correlation between a location in said text data and a corresponding location in said audio file; and

parsing said text data into said semantic units.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An automatic method for rating data files for objectionable content in a distributed computer system includes preprocessing the file to create semantic units, comparing the semantic units with a rating repository containing entries and associated ratings, assigning content rating vectors to the semantic units, and creating a modified data file incorporating rating information derived from the content rating vectors. For text files, the semantic units are words or phrases, and the rating repository also contains words or phrases with corresponding content rating vectors. For audio files, the file is first converted to a text file using voice recognition software. For image files, image processing software is used to recognize individual objects and compare them to basic images and ratings stored in the rating repository. In one embodiment, a composite content rating vector is derived for the file from the individual content rating vectors, and the composite content rating vector is incorporated into the modified file. In an alternate embodiment, semantic units with content rating vectors exceeding preset user limit values of objectionable content are blocked out by display blocks or, for audio, audio blanking signals, for example, beeps. The user can then view or hear the remaining portions of the file. The invention can be used with any type of data file that can be divided into semantic units, and can be implemented in a server, client, search engine, or proxy server.

212 Citations

44 Claims

1. In a distributed computer system, a computer-implemented method for automatic rating a raw data file for objectionable content, wherein said raw data file is a hypermedia file, a text file, an audio file, or an image file, said method comprising the steps of:
- preprocessing said raw data file to create semantic units representative of semantic contents of said raw data file;
  
  comparing said semantic units with a content rating repository comprising semantic entries and corresponding content ratings;
  
  assigning content rating vectors to said semantic units based on said comparing step; and
  
  creating a modified data file incorporating rating information derived from said content rating vectors, wherein when said raw data file is an audio file and said modified data file is a modified audio file, said preprocessing step further comprising the steps of;
  
  using a voice recognition system to create text data from said audio file;
  
  creating an audio-to-text correlation between a location in said text data and a corresponding location in said audio file; and
  
  parsing said text data into said semantic units.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 2. The computer-implemented method of claim 1 wherein said step of creating a modified data file comprises the steps of:
3. The computer-implemented method of claim 2 wherein said composite content rating vector comprises a set of components, wherein each component in said set of components is derived from corresponding components of said content rating vectors.
4. The computer-implemented method of claim 3 wherein each component of said composite content rating vector is a weighted average of said corresponding components of said content rating vectors, said weighted average including weighting factors related to values of said corresponding components of said content rating vectors.
5. The computer-implemented method of claim 3 wherein each component of said composite content rating vector is equal to a selected value of said corresponding components of said content rating vectors, wherein said selected value is a highest of said corresponding components of said content rating vectors and said selected value has at least a predetermined minimum number of occurrences.
6. The computer-implemented method of claim 2 wherein said method occurs in a server.
7. The computer-implemented method of claim 2 wherein said raw data file is stored in a server and said method occurs in a proxy server.
8. The computer-implemented method of claim 2 wherein said raw data file is stored in a server and said method occurs in a client.
9. The computer-implemented method of claim 1 wherein said step of creating a modified data file comprises the steps of:
- comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors; and
  
  replacing objectionable content corresponding to the identified objectionable semantic units in a copy of said raw data file with display blocks to produce said modified data file.
10. The computer-implemented method of claim 9 wherein said raw data file is a file chosen from the group consisting of text, audio, and image.
11. The computer-implemented method of claim 9 wherein said raw data file is stored in a server and said method occurs in a client.
12. The computer-implemented method of claim 9 wherein said preset user limit values are stored in a client and said method occurs in a server.
13. The computer-implemented method of claim 9 wherein said preset user limit values are stored in a client, said raw data file is stored in a server, and said method occurs in a proxy server.
14. The computer-implemented method of claim 9, wherein said step of creating a modified data file further comprises the steps of:
- deriving a modified composite content rating vector for said modified data file from a modified set of content rating vectors, wherein said modified set of content rating vectors does not contain content rating vectors corresponding to said objectionable semantic units; and
  
  storing said modified composite content rating vector in said modified data file.
15. The computer-implemented method of claim 14 wherein said preset user limit values are stored in a client and said method occurs in a server.
16. The computer-implemented method of claim 14 wherein said preset user limit values are stored in a client, said raw data file is stored in a server, and said method occurs in a proxy server.
17. The computer-implemented method of claim 1 wherein said step of creating a modified audio file comprises the steps of:
- comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors;
  
  using said audio-to-text correlation to locate objectionable portions of said audio file corresponding to the identified objectionable semantic units; and
  
  replacing said objectionable portions in a copy of said audio file with audio blanking signals to produce said modified audio file.
18. The computer-implemented method of claim 17 wherein said audio file is stored in a server and said method occurs in a client.
19. The computer-implemented method of claim 17 wherein said preset user limit values are stored in a client and said method occurs in a server.
20. The computer-implemented method of claim 17 wherein said preset user limit values are stored in a client, said audio file is stored in a server, and said method occurs in a proxy server.
21. The computer-implemented method of claim 1 wherein said raw data file is an image file, said modified data file is a modified image file, said semantic units are discrete objects in regions within said image file, and said preprocessing step is performed by an image processing system.
22. The computer-implemented method of claim 21 wherein said step of creating a modified image file comprises the steps of:
- comparing said content rating vectors with preset user limit values to identify objectionable discrete objects, wherein said preset user limit values define objectionable content rating vectors; and
  
  replacing objectionable content corresponding to the identified objectionable discrete objects in a copy of said image file with image blocks to produce said modified image file.
23. The computer-implemented method of claim 22 wherein said image file is stored in a server and said method occurs in a client.
24. The computer-implemented method of claim 22 wherein said preset user limit values are stored in a client and said method occurs in a server.
25. The computer-implemented method of claim 22 wherein said preset user limit values are stored in a client, said image file is stored in a server, and said method occurs in a proxy server.

26. A method for automatic rating and filtering in a network environment a raw data file for objectionable content, wherein said raw data file is a hypermedia file, a text file, an audio file, or an image file, said method comprising the steps of:
- preprocessing said raw data file to create semantic units representative of semantic contents of said raw data file, wherein if said raw data file is an audio file said preprocessing step farther comprises the steps of;
  
  using a voice recognition system to create text data from said audio file;
  
  creating an audio-to-text correlation between a location in said text data and a corresponding location in said audio file; and
  
  parsing said text data into said semantic units; and
  
  wherein if said raw data file is an image file said semantic units are discrete objects in regions within said image file and said preprocessing step is performed by an image processing system;
  
  comparing said semantic units with a content rating repository comprising semantic entries and corresponding content ratings;
  
  assigning content rating vectors to said semantic units based on said comparing step; and
  
  creating a modified data file incorporating rating information derived from said content rating vectors.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
- - 27. The method of claim 26, wherein said step of creating a modified data file further comprises the steps of:
28. The method of claim 27, wherein each component of said composite content rating vector is a weighted average of said corresponding components of said content rating vectors, said weighted average including weighting factors related to values of said corresponding components of said content rating vectors.
29. The method of claim 27, wherein each component of said composite content rating vector is equal to a selected value of said corresponding components of said content rating vectors, and wherein said selected value is the highest of said corresponding components of said content rating vectors and said selected value has at least a predetermined minimum number of occurrences.
30. The method of claim 26, wherein said method occurs in a server.
31. The method of claim 26, wherein said raw data file is stored in a server and said method occurs in a proxy server.
32. The method of claim 26, wherein said raw data file is stored in a server and said method occurs in a client.
33. The method of claim 26, wherein said step of creating a modified data file comprises the steps of:
- comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors; and
  
  replacing objectionable content corresponding to the identified objectionable semantic units in a copy of said raw data file with display blocks to produce said modified data file.
34. The method of claim 33, wherein said preset user limit values are stored in a client and said method occurs in a server.
35. The method of claim 33, wherein said preset user limit values are stored in a client, said raw data file is stored in a server, and said method occurs in a proxy server.
36. The method of claim 33, wherein said step of creating a modified data file further comprises the steps of:
- deriving a modified composite content rating vector for said modified data file from a modified set of content rating vectors, wherein said modified set of content rating vectors does not contain content rating vectors corresponding to said objectionable semantic units; and
  
  storing said modified composite content rating vector in said modified data file.
37. The method of claim 26, wherein said raw data file is an audio file and said modified data file is a modified audio file, said step of creating a modified data file further comprises the steps of:
- comparing said content rating vectors with preset user limit values to identify objectionable semantic units, wherein said preset user limit values define objectionable content rating vectors;
  
  using said audio-to-text correlation to locate objectionable portions of said audio file corresponding to the identified objectionable semantic units; and
  
  replacing said objectionable portions in a copy of said audio file with audio blanking signals to produce said modified audio file.
38. The method of claim 37, wherein said audio file is stored in a server and said method occurs in a client.
39. The method of claim 37, wherein said preset user limit values are stored in a client and said method occurs in a server.
40. The method of claim 37, wherein said preset user limit values are stored in a client, said audio file is stored in a server, and said method occurs in a proxy server.
41. The method of claim 26, wherein said raw data file is an image file and said modified data file is a modified image file, said step of creating a modified data file further comprises the steps of:
- comparing said content rating vectors with preset user limit values to identify objectionable discrete objects, wherein said preset user limit values define objectionable content rating vectors; and
  
  replacing objectionable content corresponding to the identified objectionable discrete objects in a copy of said image file with image blocks to produce said modified image file.
42. The method of claim 41, wherein said image file is stored in a server and said method occurs in a client.
43. The method of claim 41, wherein said preset user limit values are stored in a client and said method occurs in a server.
44. The method of claim 41, wherein said preset user limit values are stored in a client, said image file is stored in a server, and said method occurs in a proxy server.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Sundaresan, Neelakantan, Kraft, Reiner, Emens, Michael Lawrence
Primary Examiner(s)
Dinh, Dung C.
Assistant Examiner(s)
LE, HIEU C

Application Number

US09/374,644
Time in Patent Office

1,212 Days
Field of Search

709/203, 709/228, 709/229, 707/1, 707/9, 707/10, 725/28, 386/69
US Class Current

709/203
CPC Class Codes

H04L 67/561   Adding application-function...

H04L 67/564   Enhancement of application ...

H04L 67/568   Storing data temporarily at...

H04L 69/329   in the application layer [O...

Y10S 707/99931   Database or file accessing

Y10S 707/99939   Privileged access

Automatic rating and filtering of data files for objectionable content

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

212 Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic rating and filtering of data files for objectionable content

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

212 Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links