METHOD AND SYSTEM FOR ACOUSTIC DATA SELECTION FOR TRAINING THE PARAMETERS OF AN ACOUSTIC MODEL

US 20140046662A1
Filed: 08/05/2013
Published: 02/13/2014
Est. Priority Date: 08/07/2012
Status: Active Grant

First Claim

Patent Images

1. A method for training models in speech recognition systems through the selection of acoustic data comprising the steps of:

a. training an acoustic model;

b. performing a forced Viterbi alignment;

c. calculating a total likelihood score;

d. performing a phoneme recognition;

e. retaining selected audio files; and

f. training a new acoustic model.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model.

Citations

32 Claims

1. A method for training models in speech recognition systems through the selection of acoustic data comprising the steps of:
- a. training an acoustic model;
  
  b. performing a forced Viterbi alignment;
  
  c. calculating a total likelihood score;
  
  d. performing a phoneme recognition;
  
  e. retaining selected audio files; and
  
  f. training a new acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein step (a) further comprises the steps of:
    - a. analyzing a speech corpus comprised of audio files;
      
      b. calculating a maximum likelihood criterion; and
      
      c. estimating the parameters of said acoustic model of the probability distribution.
  - 3. The acoustic model of claim 1, wherein said model comprises a Hidden Markov Model and a Gaussian Mixture Model.
  - 4. The method of claim 1, wherein step (b) further comprises the steps of:
    - a. obtaining a total likelihood score for each audio file; and
      
      b. determining an average frame likelihood score.
  - 5. The method of claim 4, wherein step (a) further comprises the step of using the mathematical equation α
    - _r=p(x₁|q₁)Π
      
      _i=2^NP(q_i|q_i-1)p(x_i|q_i) to obtain a total likelihood score of an audio file.
  - 6. The method of claim 5, wherein audio file r∈
    - {1, R}.
  - 7. The method of claim 4, wherein step (b) further comprises the step of using the mathematical equation
  - 8. The method of claim 1, wherein step (c) further comprises the step of rejecting audio files and the transcription from the training data corpus that do not meet a specified criteria.
  - 9. The method of claim 8 further comprising the step of using the mathematical equation δ
    - =
  - 10. The method of claim 1 wherein step (d) further comprises the step of obtaining a correct phoneme recognition accuracy of each audio file with the available manual transcription of each file.
  - 11. The method of claim 10 further comprising the step of using the mathematical equation
  - 12. The method of claim 1, wherein step (e) further comprises the steps of:
    - a. Examining an audio file'"'"'s average frame likelihood score and average phoneme recognition accuracy value;
      
      b. determining if said at least one of scores and said values meet a specified threshold using the following equation ; and
      
      c. Retaining an audio file if at least one of said scores and said values meet said threshold; and
      
      d. forming a subset of a corpus with retained audio data.
  - 13. The method of claim 12, wherein the retaining of an audio file is performed using the criteria represented by the mathematical equation:
    - β
      
      _g≧
      
      δ
      
      +Δ
  - 14. The method of claim 13, wherein the value of Δ
    - is −
      
      0.1δ
      
      .
  - 15. The method of claim 12, wherein the value of the average phoneme recognition accuracy is greater than or equal to 0.9θ
    - .
  - 16. The method of claim 1, wherein step (f) further comprises the step of examining the subset of a corpus formed from retaining selected audio files.

17. A method for training an acoustic model in an automatic speech recognition system comprising the steps of:
- a. training a set of raw data using a given speech corpus and the maximum likelihood criteria;
  
  b. performing a forced Viterbi-alignment;
  
  c. calculating a total likelihood score;
  
  d. performing phoneme recognition on audio files in said corpus;
  
  e. retaining selected audio files;
  
  f. forming a subset corpus of training data; and
  
  g. training a new acoustic model with said subset.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 18. The method of claim 17, wherein the maximum likelihood criteria of step (a) is determined by estimating parameters of the raw data of the probability distribution that maximizes the likelihood of the training data given the phonetic transcription.
  - 19. The method of claim 17, wherein the forced Viterbi-alignment is performed using the raw data.
  - 20. The method of claim 17, wherein step (b) further comprises the step of obtaining the total likelihood score for each audio file.
  - 21. The method of claim 20, wherein the total likelihood score is obtained using the mathematical equation:
  - 22. The method of claim 17, wherein step (b) further comprises the step of obtaining an average frame likelihood score.
  - 23. The method of claim 22, wherein the average frame likelihood score is obtained using the mathematical equation:
  - 24. The method of claim 17, where step (c) further comprises the step of averaging the frame likelihood average to obtain an average over the entire corpus.
  - 25. The method of claim 24, wherein the corpus contains varying quality audio files.
  - 26. The method of claim 24 further comprising the step of automatically rejecting bad quality files and transcription from the corpus.
  - 27. The method of claim 26, wherein the rejection is determined with the mathematical equation:
  - 28. The method of claim 17, wherein step (d) is performed using the Viterbi search and raw acoustic model data.
  - 29. The method of claim 17, wherein step (d) further comprises the step of estimating the correct phoneme recognition accuracy of each audio file using available manual transcription of each file.
  - 30. The method of claim 29, wherein the estimation is performed using the mathematical equation:
  - 31. The method of claim 17, wherein step (e) further comprises the step of determining whether one or more of:
    - average frame likelihood data meets a threshold and phoneme recognition accuracy meets a threshold.

32. A system for training models in speech recognition systems through the selection of acoustic data comprising:
- a. means for training an acoustic model;
  
  b. means for performing a forced Viterbi alignment;
  
  c. means for calculating a total likelihood score;
  
  d. means for performing a phoneme recognition;
  
  e. means for retaining selected audio files; and
  
  f. means for training a new acoustic model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Genesys Cloud Services Incorporated
Original Assignee
Interactive Intelligence Incorporated (Genesys Cloud Services Incorporated)
Inventors
Tyagi, Vivek, Ganapathiraju, Aravind, Wyss, Felix Immanuel

Granted Patent

US 9,972,306 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/243
CPC Class Codes

G10L 15/063   Training

G10L 15/144   Training of HMMs

G10L 2015/025   Phonemes, fenemes or fenone...

METHOD AND SYSTEM FOR ACOUSTIC DATA SELECTION FOR TRAINING THE PARAMETERS OF AN ACOUSTIC MODEL

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR ACOUSTIC DATA SELECTION FOR TRAINING THE PARAMETERS OF AN ACOUSTIC MODEL

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links