Automatic censorship of audio data for broadcast

US 7,437,290 B2
Filed: 10/28/2004
Issued: 10/14/2008
Est. Priority Date: 10/28/2004
Status: Active Grant

First Claim

Patent Images

1. A method for automatically censoring audio data, comprising the steps of:

(a) automatically processing the audio data to detect any undesired speech that may be included therein, by comparison to undesired speech data, by performing the following steps;

comparing words in the audio data against words comprising the undesired speech, to identify potential matches;

dynamically varying a probability threshold dependent upon at least one criterion; and

based upon a probability of a potential match and the probability threshold, determining whether any undesired speech is included in the audio data;

(b) for each occurrence of undesired speech that is automatically detected, altering the undesired speech detected in the audio data, producing censored audio data in which the undesired speech is substantially no longer perceivable by a listening audience; and

(c) dynamically adjusting the probability threshold based upon a frequency with which undesired speech by a specific speaker is detected in the audio data, so that as the occurrences of undesired speech that are detected increase, the probability threshold is reduced.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An input audio data stream comprising speech is processed by an automatic censoring filter in either a real-time mode, or a batch mode, producing censored speech that has been altered so that undesired words or phrases are either unintelligible or inaudible. The automatic censoring filter employs a lattice comprising either phonemes and/or words derived from phonemes for comparison against corresponding phonemes or words included in undesired speech data. If the probability that a phoneme or word in the input audio data stream matches a corresponding phoneme or word in the undesired speech data is greater than a probability threshold, the input audio data stream is altered so that the undesired word or a phrase comprising a plurality of such words is unintelligible or inaudible. The censored speech can either be stored or made available to an audience in real-time.

Citations

19 Claims

1. A method for automatically censoring audio data, comprising the steps of:
- (a) automatically processing the audio data to detect any undesired speech that may be included therein, by comparison to undesired speech data, by performing the following steps;
  
  comparing words in the audio data against words comprising the undesired speech, to identify potential matches;
  
  dynamically varying a probability threshold dependent upon at least one criterion; and
  
  based upon a probability of a potential match and the probability threshold, determining whether any undesired speech is included in the audio data;
  
  (b) for each occurrence of undesired speech that is automatically detected, altering the undesired speech detected in the audio data, producing censored audio data in which the undesired speech is substantially no longer perceivable by a listening audience; and
  
  (c) dynamically adjusting the probability threshold based upon a frequency with which undesired speech by a specific speaker is detected in the audio data, so that as the occurrences of undesired speech that are detected increase, the probability threshold is reduced.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the at least one criterion includes at least one of:
    - (a) an expected audience for the audio data;
      
      (b) an identity of a speaker that uttered the audio data;
      
      (c) a time at which the audio data will be heard by an audience;
      
      (d) a type of event resulting in the audio data;
      
      (e) an application of the audio data; and
      
      (f) a frequency with which undesired speech by a specific speaker has previously been detected in the audio data.
  - 3. The method of claim 1, wherein the step of altering comprises one of the steps of:
    - (a) substantially reducing a volume of any portions of the audio data that match the undesired speech so that the portions are substantially inaudible;
      
      (b) deleting any portions of the audio data that match the undesired speech;
      
      (c) overwriting any portions of the audio data that match the undesired speech with an obscuring audio signal that prevents the undesired speech from being intelligible; and
      
      (d) replacing any portions of the audio data that match the undesired speech with related speech that is acceptable and is not included in the undesired speech, the related speech being produced using phonemes that were previously uttered by a speaker whose speech is being replaced.
  - 4. The method of claim 1, further comprising the step of setting different probability thresholds for specific words or phrases included in the undesired speech, so that more objectionable words or phrases have a lower probability threshold than less objectionable words or phrases.
  - 5. A memory medium on which are stored machine executable instructions for carrying out the steps of claim 1.

6. A method for automatically censoring audio data to prevent undesired speech included therein from being understandable by an audience who may be listening to the audio data, comprising the steps of:
- (a) accessing grammar data and undesired speech data that are in a desired format selected to be usable for comparison to the audio data;
  
  (b) processing the audio data to produce processed audio data that is in the desired format by performing the following steps;
  
  generating a lattice of phonemes comprising word fragments that are likely included in the audio data;
  
  comparing the word fragments against corresponding word fragments that are included in the undesired speech data, to identify potential matches;
  
  dynamically varying a probability threshold dependent upon at least one criterion; and
  
  based upon a probability of a potential match and the probability threshold, determining whether any undesired speech is included in the audio data;
  
  (c) if portions of the processed audio data are found to match any undesired speech, altering the audio data to produce censored audio data in which each occurrence of undesired speech is made incapable of being understood by the audience, else if none of the audio data is found to match any undesired speech, the audio data are not so altered; and
  
  (d) dynamically adjusting the probability threshold based upon a frequency with which undesired speech by a specific speaker is detected in the audio data, so that as the occurrences of undesired speech that are detected increase, the probability threshold is reduced.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 7. The method of claim 6, wherein the desired format comprises phonemes and wherein the lattice of phonemes comprises one or more nodes, each node having associated therewith a time interval for the node and a probability that the phoneme is included in the audio data, and wherein the step of processing comprises the step of using the probability associated with node and the time interval to determine an overall probability indicating the likelihood that the phonemes are actually included in the audio data.
  - 8. The method of claim 7, wherein the step of comparing comprises the step of applying a probability threshold to determine if phonemes included in the lattice of phonemes likely match corresponding phonemes in the undesired speech data so as to indicate that the audio data includes undesired speech corresponding to the phonemes.
  - 9. The method of claim 6, wherein the step of altering the audio data comprises one of the steps of:
    - (a) substantially reducing a volume of any portions of the audio data that match the undesired speech so that the portions are not audible;
      
      (b) deleting any portions of the audio data that match the undesired speech;
      
      (c) overwriting any portions of the audio data that match the undesired speech with an obscuring audio signal that prevents the undesired speech from being intelligible; and
      
      (d) replacing any portions of the audio data that match the undesired speech with related speech that is acceptable and is not included in the undesired speech, using phonemes that were previously uttered by a speaker whose speech is being replaced.
  - 10. The method of claim 6, wherein the step of automatically comparing applies a dynamically variable probabilistic determination, wherein a probability threshold for determining that a portion of the audio data matches undesired speech is determined based upon at least one criterion relating to the audio data.
  - 11. The method of claim 10, wherein the at least one criterion includes at least one of:
    - (a) an expected audience for the audio data;
      
      (b) an identity of a speaker that uttered the audio data;
      
      (c) a time at which the audio data will be heard by an audience;
      
      (d) a type of event resulting in the audio data;
      
      (e) an application of the audio data; and
      
      (f) a frequency with which undesired speech by a specific speaker has previously been detected in the audio data.
  - 12. The method of claim 10, further comprising the step of setting different probability thresholds for specific words or phrases included in the undesired speech data, so that more objectionable words or phrases have a lower probability threshold than less objectionable words or phrases.
  - 13. The method of claim 6, wherein the step of processing the audio data is carried out using a speech recognition engine.
  - 14. The method of claim 6, wherein the audio data are processed in one of:
    - (a) a batch mode wherein the audio data are processed offline; and
      
      (b) a real-time mode wherein the audio data are processed as produced and just before being heard by an audience.
  - 15. A memory medium on which machine readable instructions are stored for carrying out the steps of claim 6.

16. A system for automatically censoring audio data to prevent undesired speech included therein from being understandable by an audience who may be listening to the audio data, comprising:
- (a) a memory in which the undesired speech data and machine instructions are stored and which at least temporarily stores the audio data;
  
  (b) a processor that is coupled to the memory and able to access the audio data at least temporarily stored therein, the processor executing the machine instructions, causing the processor to carry out a plurality of functions, including;
  
  (i) automatically processing the audio data to detect any undesired speech that may be included therein by performing the following steps;
  
  generating a lattice of phonemes comprising word fragments that are likely included in the audio data;
  
  comparing the word fragments against corresponding word fragments that are included in the undesired speech data, to identify potential matches;
  
  dynamically varying a probability threshold dependent upon at least one criterion; and
  
  based upon a probability of a potential match and the probability threshold, determining whether any undesired speech is included in the audio data;
  
  (ii) for each occurrence of undesired speech that is automatically detected, altering the occurrence in the audio data, producing censored audio data in which the undesired speech is substantially no longer perceivable by a listening audience; and
  
  (iii) dynamically adjusting the probability threshold based upon a frequency with which undesired speech by a specific speaker is detected in the audio data, so that as the occurrences of undesired speech that are detected increase, the probability threshold is reduced.
- View Dependent Claims (17, 18)
- - 17. The system of claim 16, wherein the at least one criterion includes at least one of:
    - (a) an expected audience for the audio data;
      
      (b) an identity of a speaker that uttered the audio data;
      
      (c) a time at which the audio data will be heard by an audience;
      
      (d) a type of event resulting in the audio data;
      
      (e) an application of the audio data; and
      
      (f) a frequency with which undesired speech by a specific speaker has previously been detected in the audio data.
  - 18. The system of claim 16, wherein the machine instructions cause the processor to alter the audio by doing at least one of:
    - (a) substantially reducing a volume of any portions of the audio data that match the undesired speech so that the portions are substantially inaudible;
      
      (b) deleting any portions of the audio data that match the undesired speech;
      
      (c) overwriting any portions of the audio data that match the undesired speech with an obscuring audio signal that prevents the undesired speech from being intelligible; and
      
      (d) replacing any portions of the audio data that match the undesired speech with related speech that is acceptable and is not included in the undesired speech, the related speech being produced using phonemes that were previously uttered by a speaker whose speech is being replaced.

19. A system for automatically censoring audio data to prevent undesired speech included therein from being understandable by an audience who may be listening to the audio data, comprising:
- (a) a memory in which the undesired speech data and machine instructions are stored and which at least temporarily stores the audio data;
  
  (b) a processor that is coupled to the memory and able to access the audio data at least temporarily stored therein, the processor executing the machine instructions, causing the processor to carry out a plurality of functions, including;
  
  (i) automatically processing the audio data to detect any undesired speech that may be included therein by performing the following steps;
  
  comparing words in the audio data against words comprising the undesired speech, to identify potential matches;
  
  dynamically varying a probability threshold dependent upon at least one criterion; and
  
  based upon a probability of a potential match and the probability threshold, determining whether any undesired speech is included in the audio data;
  
  (ii) for each occurrence of undesired speech that is automatically detected, altering the undesired speech detected in the audio data, producing censored audio data in which the undesired speech is substantially no longer perceivable by a listening audience; and
  
  (iii) dynamically adjusting the probability threshold based upon a frequency with which undesired speech by a specific speaker is detected in the audio data, so that as the occurrences of undesired speech that are detected increase, the probability threshold is reduced.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Danieli, Damon V.
Primary Examiner(s)
Hudspeth; David R.
Assistant Examiner(s)
Neway; Samuel G

Application Number

US10/976,116
Publication Number

US 20060095262A1
Time in Patent Office

1,447 Days
Field of Search

None
US Class Current

704/251
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 2015/088   Word spotting

G10L 21/00   Speech or voice signal proc...

Automatic censorship of audio data for broadcast

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic censorship of audio data for broadcast

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links