Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors

US 5,497,447 A
Filed: 03/08/1993
Issued: 03/05/1996
Est. Priority Date: 03/08/1993
Status: Expired due to Fees

First Claim

Patent Images

1. A speech coding apparatus comprising:

means for storing a model of a training script, said training script model comprising a series of elementary models from a finite set of elementary models, each elementary model in the training script having a phonetic context comprising one or more preceding or following models in the training script;

means for measuring the value of at least one feature of a training utterance of the training script over each of a series of successive time intervals for producing a series of training feature vector signals representing feature values;

means for identifying a first set of training feature vector signals corresponding to a first elementary model in the training script model;

means for storing at least a first reference vector signal and a second reference vector signal, each reference vector signal having at least one parameter value, the first reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a first phonetic context of preceding and following phonetic models, the second reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a second phonetic context of preceding and following phonetic models, different from the first context;

means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the first reference vector signal to obtain a first closeness score for each training feature vector signal and the first reference vector signal;

means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the second reference vector signal to obtain a second closeness score for each training feature vector signal and the second reference vector signal;

means for comparing, for each training feature vector signal in the first set, the first closeness score for the training feature vector signal with the second closeness score for the training feature vector signal to obtain a reference match score for each training feature vector signal and the first and second reference vector signals;

means for storing a first subset of the training feature vector signals in the first set having reference match scores greater than a threshold Q, and for storing a second subset of the training feature vector signals in the first set having reference match scores less than the threshold Q; and

means for generating one or more partition values for a first prototype vector signal from the first subset of training feature vector signals, and for generating one or more additional partition values for the first prototype vector signal from the second subset of training feature vector signals.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech coding apparatus in which measured acoustic feature vectors are each represented by the best matched prototype vector. The prototype vectors are generated by storing a model of a training script comprising a series of elementary models. The value of at least one feature of a training utterance of the training script is measured over each of a series of successive time intervals to produce a series of training feature vectors. A first set of training feature vectors corresponding to a first elementary model in the training script is identified. The feature value of each training feature vector signal in the first set is compared to the parameter value of a first reference vector signal to obtain a first closeness score, and is compared to the parameter value of a second reference vector to obtain a second closeness score for each training feature vector. For each training feature vector in the first set, the first closeness score is compared with the second closeness score to obtain a reference match score. A first subset contains those training feature vectors in the first set having reference match scores better than a threshold Q, and a second subset contains those having reference match scores less than the threshold Q. One or more partition values are generated for a first prototype vector frown the first subset of training feature vectors, and one or more additional partition values are generated for the first prototype vector from the second subset of training feature vectors.

Citations

17 Claims

1. A speech coding apparatus comprising:
- means for storing a model of a training script, said training script model comprising a series of elementary models from a finite set of elementary models, each elementary model in the training script having a phonetic context comprising one or more preceding or following models in the training script;
  
  means for measuring the value of at least one feature of a training utterance of the training script over each of a series of successive time intervals for producing a series of training feature vector signals representing feature values;
  
  means for identifying a first set of training feature vector signals corresponding to a first elementary model in the training script model;
  
  means for storing at least a first reference vector signal and a second reference vector signal, each reference vector signal having at least one parameter value, the first reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a first phonetic context of preceding and following phonetic models, the second reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a second phonetic context of preceding and following phonetic models, different from the first context;
  
  means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the first reference vector signal to obtain a first closeness score for each training feature vector signal and the first reference vector signal;
  
  means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the second reference vector signal to obtain a second closeness score for each training feature vector signal and the second reference vector signal;
  
  means for comparing, for each training feature vector signal in the first set, the first closeness score for the training feature vector signal with the second closeness score for the training feature vector signal to obtain a reference match score for each training feature vector signal and the first and second reference vector signals;
  
  means for storing a first subset of the training feature vector signals in the first set having reference match scores greater than a threshold Q, and for storing a second subset of the training feature vector signals in the first set having reference match scores less than the threshold Q; and
  
  means for generating one or more partition values for a first prototype vector signal from the first subset of training feature vector signals, and for generating one or more additional partition values for the first prototype vector signal from the second subset of training feature vector signals.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A speech coding apparatus as claimed in claim 1, wherein:
    - the first set of training feature vector signals corresponds to the first elementary model in the training script model in a first phonetic context.
  - 3. A speech coding apparatus as claimed in claim 2, further comprising:
    - means for identifying a second set of training feature vector signals corresponding to a second elementary model in the training script model and means for identifying a third set of training feature vector signals corresponding to a third elementary model in the training script model;
      
      said at least one parameter value of the first reference vector signal comprises an arithmetic mean of the feature values of the second set of training feature vector signals corresponding to the second elementary model in the training script in a second context; and
      
      said at least one parameter value of the second reference vector signal comprises an arithmetic mean of the feature values of the third set of training feature vector signals corresponding to the third elementary model in the training script in a third context different from the first and second contexts.
  - 4. A speech coding apparatus as claimed in claim 3, wherein the means for generating one or more partition values comprises means for grouping each subset of training feature vector signals into one or more different clusters.
  - 5. A speech coding apparatus as claimed in claim 4, wherein:
    - the means for grouping each subset of training feature vector signals into one or more different clusters further comprises;
      
      means for storing at least a third reference vector signal and a fourth reference vector signal, each reference vector signal having at least one parameter value;
      
      means for comparing the feature values of each training feature vector signal in the first subset to said at least one parameter value of the third reference vector signal to obtain a third closeness score for the training feature vector signal and the third reference vector signal;
      
      means for comparing the feature values of each training feature vector signal in the first subset to said at least one parameter value of the fourth reference vector signal to obtain a fourth closeness score for the training feature vector signal and the fourth reference vector signal;
      
      means for comparing, for each training feature vector signal in the first subset, the third closeness score for the training feature vector signal with the fourth closeness score for the training feature vector signal to obtain a sub-reference match score for each training feature vector signal and the third and fourth reference vector signals; and
      
      means for storing a first sub-subset of the training feature vector signals in the first subset having sub-reference match scores greater than a threshold Q'"'"', and for storing a second sub-subset of the training feature vector signals in the first subset having sub-reference match scores less than the threshold Q'"'"'; and
      
      the means for generating one or more partition values generates one or more partition values for the first prototype vector signal from the first sub-subset of training feature vector signals, and generates one or more additional partition values for the first prototype vector signal from the second sub-subset of training feature vector signals.
  - 6. A speech coding apparatus as claimed in claim 5, wherein each partition value comprises the arithmetic mean of the feature values of the training feature vector signals in one of the clusters.
  - 7. A speech coding apparatus as claimed in claim 6, wherein each partition value further comprises a variance of the feature values of the training feature vector signals in one of the clusters.
  - 8. A speech coding apparatus as claimed in claim 7, wherein the threshold Q is equal to one.
  - 9. A speech coding apparatus as claimed in claim 1, wherein the means for collecting comprises a microphone.

10. A speech coding method comprising:
- storing a model of a training script, said training script model comprising a series of elementary models from a finite set of elementary models, each elementary model in the training script having a phonetic context comprising one or more preceding or following models in the training script;
  
  measuring the value of at least one feature of a training utterance of the training script over each of a series of successive time intervals for producing a series of training feature vector signals representing the feature values;
  
  identifying a first set of training feature vector signals corresponding to a first elementary model in the training script model;
  
  storing at least a first reference vector signal and a second reference vector signal, each reference vector signal having at least one parameter value, the first reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a first phonetic context of preceding and following phonetic models, the second reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a second phonetic context of preceding and following phonetic models, different from the first context;
  
  comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the first reference vector signal to obtain a first closeness score for each training feature vector signal and the first reference vector signal;
  
  comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the second reference vector signal to obtain a second closeness score for each training feature vector signal and the second reference vector signal;
  
  comparing, for each training feature vector signal in the first set, the first closeness score for the training feature vector signal with the second closeness score for the training feature vector signal to obtain a reference match score for each training feature vector signal and the first and second reference vector signals;
  
  storing a first subset of the training feature vector signals in the first set having reference match scores greater than a threshold Q, and storing a second subset of the training feature vector signals in the first set having reference match scores less than the threshold Q; and
  
  generating one or more partition values for a first prototype vector signal from the first subset of training feature vector signals, and for generating one or more additional partition values for the first prototype vector signal from the second subset of training feature vector signals.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. A speech coding method as claimed in claim 10, wherein:
    - the first set of training feature vector signals corresponds to the first elementary model in the training script model in a first phonetic context.
  - 12. A speech coding method as claimed in claim 11, further comprising steps of:
    - identifying a second set of training feature vector signals corresponding to a second elementary model in the training script model; and
      
      identifying a third set of training feature vector signals corresponding to a third elementary model in the training script model;
      
      said at least one parameter value of the first reference vector signal comprises an arithmetic mean of the feature values of the second set of training feature vector signals corresponding to the second elementary model in the training script in a second context; and
      
      said at least one parameter value of the second reference vector signal comprises an arithmetic mean of the feature values of the third set of training feature vector signals corresponding to the third elementary model in the training script in a third context different from the first and second contexts.
  - 13. A speech coding method as claimed in claim 12, wherein the step of generating one or more partition values comprises the step of grouping each subset of training feature vector signals into one or more different clusters.
  - 14. A speech coding method as claimed in claim 13, wherein:
    - the step of grouping each subset of training feature vector signals into one or more different clusters further comprises;
      
      storing at least a third reference vector signal and a fourth reference vector signal, each reference vector signal having at least one parameter value;
      
      comparing the feature values of each training feature vector signal in the first subset to said at least one parameter value of the third reference vector signal to obtain a third closeness score for the training feature vector signal and the third reference vector signal;
      
      comparing the feature values of each training feature vector signal in the first subset to said at least one parameter value of the fourth reference vector signal to obtain a fourth closeness score for the training feature vector signal and the fourth reference vector signal;
      
      comparing, for each training feature vector signal in the first subset, the third closeness score for the training feature vector signal with the fourth closeness score for the training feature vector signal to obtain a sub-reference match score for each training feature vector signal and the third and fourth reference vector signals; and
      
      storing a first sub-subset of the training feature vector signals in the first subset having sub-reference match scores greater than a threshold Q'"'"', and storing a second sub-subset of the training feature vector signals in the first subset having sub-reference match scores less than the threshold Q'"'"'; and
      
      wherein the step of generating one or more partition values generates one or more partition values for the first prototype vector signal from the first sub-subset of training feature vector signals, and generates one or more additional partition values for the first prototype vector signal from the second sub-subset of training feature vector signals.
  - 15. A speech coding method as claimed in claim 14, wherein each partition value comprises the arithmetic mean of the feature values of the training feature vector signals in one of the clusters.
  - 16. A speech coding method as claimed in claim 15, wherein each partition value further comprises a variance of the feature values of the training feature vector signals in one of the clusters.
  - 17. A speech coding method as claimed in claim 16, wherein the threshold Q is equal to one.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Bahl, Lalit R., De Souza, Peter D., Picheny, Michael A., Gopalakrishnan, Ponani S.
Primary Examiner(s)
Knepper, David D.
Assistant Examiner(s)
Sartori, Michael A.

Application Number

US08/028,028
Time in Patent Office

1,093 Days
Field of Search

395/2.52-2.54, 395/2.6, 395/2.65, 395/2.64, 395/2.45, 395/2.66, 381/41-43
US Class Current

704/245
CPC Class Codes

G10L 15/063 Training

Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links