Speech recognition method

US 4,829,577 A
Filed: 03/12/1987
Issued: 05/09/1989
Est. Priority Date: 03/25/1986
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition method based on recognizing words, comprising the steps of:

defining, for each word, a probabilistic model including (i) a plurality of states, (ii) at least one transition, each transition extending from a state to a state, (iii) a plurality of generated labels indicative of time between states, and (iv) probabilities of outputting each label in each of said transitions;

generating a first label string of said labels for each of said words from initial data thereof;

for each of said words, iteratively updating the probabilities of the corresponding probabilistic model, comprising the steps of;

(a) inputting a first label string into a corresponding probabilistic model;

(b) obtaining a first frequency of each of said labels being output at each of said transitions over the time in which the corresponding first label string is input into the corresponding probabilistic model;

(c) obtaining a second frequency of each of said states occurring over the time in which the corresponding first label string is inputted into the corresponding probabilistic model; and

(d) obtaining each of a plurality of new probabilities of said corresponding probabilistic model by dividing the corresponding first frequency by the corresponding second frequency;

storing the first and second frequencies obtained in the last step of said iterative updating;

determining which of said words require adaptation to recognize different speakers or the same speaker at different times;

generating, for each of said words requiring adaptation, a second label string from adaptation data comprising the probabilistic model of the word to be adapted;

obtaining, for each of said words requiring adaptation, a third frequency of each of said labels being outputted at each of said transitions over the time in which the corresponding second label string is inputted into the corresponding probabilistic model;

obtaining, for each of said words requiring adaptation, a fourth frequency of each of said states occurring over the time in which the corresponding second label string is outputted into the corresponding probabilistic model;

obtaining fifth frequencies by interpolation of the corresponding first and third frequencies;

obtaining sixth frequencies by interpolation of the corresponding second and third frequencies; and

obtaining adapted probabilities for said adaptation data by dividing the corresponding fifth frequency by the corresponding sixth frequency.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speaker adaptation which enables a person to use a Hidden Markov model type recognizer previously trained by another person or persons. During initial training, parameters of Markov models are calculated iteratively by, for example, using the Forward-Backward algorithm. Adapting the recognizer to a new speaker involves (a) storing and utilizing intermediate results or probabilistic frequencies of a last iteration of training parameters, and (b) calculating new parameters by computing a weighted sum of the probabilistic frequencies stored during training and frequencies obtained from adaptation data derived from known utterances of words made by the new speaker.

74 Citations

View as Search Results

4 Claims

1. A speech recognition method based on recognizing words, comprising the steps of:
- defining, for each word, a probabilistic model including (i) a plurality of states, (ii) at least one transition, each transition extending from a state to a state, (iii) a plurality of generated labels indicative of time between states, and (iv) probabilities of outputting each label in each of said transitions;
  
  generating a first label string of said labels for each of said words from initial data thereof;
  
  for each of said words, iteratively updating the probabilities of the corresponding probabilistic model, comprising the steps of;
  
  (a) inputting a first label string into a corresponding probabilistic model;
  
  (b) obtaining a first frequency of each of said labels being output at each of said transitions over the time in which the corresponding first label string is input into the corresponding probabilistic model;
  
  (c) obtaining a second frequency of each of said states occurring over the time in which the corresponding first label string is inputted into the corresponding probabilistic model; and
  
  (d) obtaining each of a plurality of new probabilities of said corresponding probabilistic model by dividing the corresponding first frequency by the corresponding second frequency;
  
  storing the first and second frequencies obtained in the last step of said iterative updating;
  
  determining which of said words require adaptation to recognize different speakers or the same speaker at different times;
  
  generating, for each of said words requiring adaptation, a second label string from adaptation data comprising the probabilistic model of the word to be adapted;
  
  obtaining, for each of said words requiring adaptation, a third frequency of each of said labels being outputted at each of said transitions over the time in which the corresponding second label string is inputted into the corresponding probabilistic model;
  
  obtaining, for each of said words requiring adaptation, a fourth frequency of each of said states occurring over the time in which the corresponding second label string is outputted into the corresponding probabilistic model;
  
  obtaining fifth frequencies by interpolation of the corresponding first and third frequencies;
  
  obtaining sixth frequencies by interpolation of the corresponding second and third frequencies; and
  
  obtaining adapted probabilities for said adaptation data by dividing the corresponding fifth frequency by the corresponding sixth frequency.
- View Dependent Claims (2, 3, 4)
- - 2. The method in accordance with claim 1 wherein each of said first frequencies is stored indirectly as a product of the corresponding probability and the corresponding second frequency for a given word.
  - 3. The method in accordance with claim 2 wherein each of the probabilities of the said probabilistic model into which adaptation data is to be inputted have been subjected to smoothing operation.
  - 4. The method in accordance with claim 1 wherein each of probabilities of the said probabilistic model into which adaptation data is to be inputted have been subjected to a smoothing operation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nishimura, Masafumi, Kuroda, Akihiro, Sugawara, Kazuhide
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
NOT, DEFINED

Application Number

US07/025,257
Time in Patent Office

789 Days
Field of Search

381/45, 381/43, 381/42, 364/513.5
US Class Current

704/244
CPC Class Codes

G10L 15/07 to the speaker

G10L 15/144 Training of HMMs

Speech recognition method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

74 Citations

4 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

74 Citations

4 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links