IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA

US 20100179811A1
Filed: 01/13/2010
Published: 07/15/2010
Est. Priority Date: 01/13/2009
Status: Active Grant

First Claim

Patent Images

1. A method for processing audio data conveying speech information, said method comprising:

a) providing a computer based processing entity having an input, the processing entity being programmed with software to perform speech recognition on the audio data;

b) providing at the input a signal indicative of at least one keyword;

c) performing speech recognition on the audio data with the processing entity to determine if the audio data contains one or more potential occurrences of the keyword;

d) when the performing identifies a potential occurrence of a keyword in the audio data, generating location data indicative of a location of a spoken utterance in the audio data corresponding to the potential occurrence;

e) processing the location data with the processing entity to select a subset of audio data from the audio data for playing to an operator, the subset containing at least a portion of the spoken utterance corresponding to the potential occurrence;

f) playing the selected subset of audio data to the operator;

g) receiving at the input verification data from the operator confirming that the selected subset of audio data contains the keyword or indicating that the selected subset of audio data does not contain the keyword;

h) processing the verification data with the processing entity to generate a label indicating whether or not the audio data contains the keyword;

i) storing the label in a machine readable storage medium.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords.

82 Citations

View as Search Results

31 Claims

1. A method for processing audio data conveying speech information, said method comprising:
- a) providing a computer based processing entity having an input, the processing entity being programmed with software to perform speech recognition on the audio data;
  
  b) providing at the input a signal indicative of at least one keyword;
  
  c) performing speech recognition on the audio data with the processing entity to determine if the audio data contains one or more potential occurrences of the keyword;
  
  d) when the performing identifies a potential occurrence of a keyword in the audio data, generating location data indicative of a location of a spoken utterance in the audio data corresponding to the potential occurrence;
  
  e) processing the location data with the processing entity to select a subset of audio data from the audio data for playing to an operator, the subset containing at least a portion of the spoken utterance corresponding to the potential occurrence;
  
  f) playing the selected subset of audio data to the operator;
  
  g) receiving at the input verification data from the operator confirming that the selected subset of audio data contains the keyword or indicating that the selected subset of audio data does not contain the keyword;
  
  h) processing the verification data with the processing entity to generate a label indicating whether or not the audio data contains the keyword;
  
  i) storing the label in a machine readable storage medium.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, further comprising providing at the input a signal indicative of the audio data on which to perform speech recognition.
  - 3. The method of claim 2, wherein the audio data comprises at least one recording, wherein the signal indicative of the audio data identifies the at least one recording from among a collection of recordings.
  - 4. The method of claim 3, further comprising accessing a memory to retrieve therefrom the at least one recording.
  - 5. The method of claim 2, wherein the signal indicative of the audio data comprises the audio data.
  - 6. The method of claim 1, wherein the audio data comprises recordings of any one of a telephony session, a recorded conversation, and the audio component of an audiovisual signal.
  - 7. The method of claim 1, wherein the label comprises an entry in a list.
  - 8. The method of claim 1, wherein the label comprises a memory pointer.
  - 9. The method of claim 1, wherein the label indicating whether or to the audio data contains the keyword is generated only if the audio data satisfies at least one criterion, the at least one criterion including whether the verification data confirms that the selected subset of audio data contains the keyword or indicates that the selected subset of audio data does not contain the keyword.
  - 10. The method of claim 1, wherein speech recognition is performed continuously.
  - 11. The method of claim 1, wherein the method is executed in real-time.

12. A method of identifying occurrences of a keyword within audio data, the method comprising:
- a) providing a computer based processing entity programmed with software, the software implementing a language model to perform speech recognition;
  
  b) inputting in the processing entity data conveying the keyword;
  
  c) processing the data conveying the keyword with the software to adapt the language model to the keyword and generate an adapted language model;
  
  d) processing the audio data with the adapted language model to determine if the audio data contains the keyword;
  
  e) releasing result data at an output of the processing entity conveying results of the processing of the audio data with the adapted language model.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 13. The method of claim 12, wherein processing the data conveying the keyword with the software to adapt the language model to the keyword comprises increasing the likelihood of the keyword in the language model.
  - 14. The method of claim 13, wherein increasing the likelihood of the keyword in the language model comprises boosting the log-likelihood of every n-gram that contains the keyword.
  - 15. The method of claim 13, further comprising prior to increasing the likelihood of the keyword in the language model, adding the keyword to the language model.
  - 16. The method of claim 12, wherein processing the audio data with the adapted language model comprises performing speech recognition on the audio data using the adapted language model to derive a transcript of the audio data.
  - 17. The method of claim 16, wherein the speech recognition is continuous speech recognition.
  - 18. The method of claim 16, wherein processing the audio data with the adapted language model further comprises performing a text-to-phoneme conversion on the transcript to derive a first phoneme sequence.
  - 19. The method of claim 18, wherein the text-to-phoneme conversion is performed continuously.
  - 20. The method of claim 18, wherein processing the audio data with the adapted language model further comprises mapping the phonemes in the first phoneme sequence to the audio data to derive a time-aligned phoneme sequence.
  - 21. The method of claim 20, wherein mapping the phonemes in the first phoneme sequence to the audio data is performed continuously.
  - 22. The method of claim 20, wherein processing the audio data with the adapted language model further comprises searching the time-aligned phoneme sequence for occurrences of a keyword phoneme sequence corresponding to the keyword.
  - 23. The method of claim 22, wherein searching the time-aligned phoneme sequence for occurrences of a keyword phoneme sequence comprises computing a confusion matrix.
  - 24. The method of claim 22, the speech recognition is performed using a speaker-independent acoustic model.
  - 25. The method of claim 22, wherein speech recognition is performed using an adapted acoustic model derived from an initial time-aligned phoneme sequence.
  - 26. The method of claim 25, wherein the transcript of the recording is a second transcript, the method further comprising:
    - a) performing speech recognition on the audio data using a speaker-independent acoustic model to derive an initial transcript of the audio datab) performing a text-to-phoneme conversion on the initial transcript of the recording to derive an initial phoneme sequence;
      
      c) mapping the phonemes in the initial phoneme sequence to the audio data to derive an initial time-aligned phoneme sequence; and
      
      d) generating the adapted acoustic model on the basis of the initial time-aligned phoneme sequence and the speaker-independent acoustic model.
  - 27. The method of claim 26, wherein the adapted acoustic model is generated using feature-based maximum likelihood linear regression (fMLLR).
  - 28. The method of claim 22, further comprising performing speaker diarization on the audio data to identify segments of the audio data corresponding to individual speakers;
    - wherein the transcript of the audio data is a per-speaker transcript of the audio data derived by performing speech recognition using a per-speaker adapted acoustic model and an indication of the segments of the recording corresponding to individual speakers.
  - 29. The method of claim 12, further comprising inputting in the processing entity a signal indicative of the audio data.
  - 30. The method of claim 29, wherein the audio data comprises at least one recording, wherein the signal indicative of the audio data identifies the at least one recording from among a collection of recordings.

31. A method of identifying occurrences of keywords within audio recordings containing speech information, the method comprising:
- a) providing a computer based processing entity programmed with software, the software implementing a language model to perform speech recognition;
  
  b) inputting in the processing entity first data conveying a first keyword;
  
  c) processing the first data with the software to adapt the language model to the first keyword and generate a language model adapted to the first keyword;
  
  d) processing a first set of recordings with the language model adapted to the first keyword to determine if the first set of recordings contains the first keyword;
  
  e) inputting in the processing entity second data conveying a second keyword;
  
  f) processing the second data with the software to adapt the language model to the second keyword and generate a language model adapted to the second keyword;
  
  g) processing a second set of recordings with the language model adapted to the second keyword to determine if the second set of recordings contains the second keyword;
  
  h) releasing data at the output of the processing entity conveying results of the processing of the first and second sets recordings with the language models adapted to the first and second keywords, respectively.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Centre de Recherche Informatique de Montréal
Original Assignee
Centre de Recherche Informatique de Montréal
Inventors
BOULIANNE, Gilles, GUPTA, Vishwa Nath

Granted Patent

US 8,423,363 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/088 Word spotting

IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

82 Citations

31 Claims

Specification

Use Cases

Quick Links

Others

IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

82 Citations

31 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others