Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training

US 6,571,208 B1
Filed: 11/29/1999
Issued: 05/27/2003
Est. Priority Date: 11/29/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for developing context-dependent models for automatic speech recognition, comprising:

generating an eigenspace to represent a training speaker population;

providing a set of acoustic data for at least one training speaker and representing said acoustic data in said eigenspace to determine at least one allophone centroid for said training speaker;

subtracting said centroid from said acoustic data to generate speaker-adjusted acoustic data for said training speaker;

using said speaker-adjusted acoustic data to grow at least one decision tree having leaf nodes containing context-dependent models for different allophones.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation. In another embodiment maximum likelihood estimation techniques are used to develop common decision tree frameworks that may be shared across all speakers when constructing the eigenvoice representation of speaker space.

Citations

10 Claims

1. A method for developing context-dependent models for automatic speech recognition, comprising:
- generating an eigenspace to represent a training speaker population;
  
  providing a set of acoustic data for at least one training speaker and representing said acoustic data in said eigenspace to determine at least one allophone centroid for said training speaker;
  
  subtracting said centroid from said acoustic data to generate speaker-adjusted acoustic data for said training speaker;
  
  using said speaker-adjusted acoustic data to grow at least one decision tree having leaf nodes containing context-dependent models for different allophones.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 further comprising using a set of acoustic data for a plurality of training speakers to generate said speaker-adjusted acoustic data for each of said plurality of training speakers.
  - 3. The method of claim 1 wherein said eigenspace is generated by constructing supervectors based on speech from said training speaker population and performing dimensionality reduction upon said supervectors to define a reduced dimensionality space that spans said training speaker population.
  - 4. A method of performing speech recognition using said context-dependent models developed as recited in claim 1, comprising:
5. A method of performing speech recognition using said context-dependent models developed as recited in claim 1, comprising:
- providing speech data from a new speaker;
  
  using said eigenspace to determine at least one new speaker centroid of a new speaker and adding said new speaker centroid to said context-dependent models to generate new speaker-adjusted context-dependent models; and
  
  applying said speech data to a speech recognizer employing said new speaker-adjusted context-dependent models.
6. The method of claim 1, wherein the decision tree has at least one non-leaf node containing an eigen dimension question.

7. A method of training context-dependent models for automatic speech recognition, comprising:
- constructing a decision tree framework of yes-no questions having leaf nodes for storing context-dependent allophone models;
  
  training a set of speaker-dependent acoustic models for a plurality of training speakers and using said decision tree framework to construct a plurality of decision trees for said training speakers, storing the speaker-dependent acoustic models for each training speaker in the leaf nodes of the respective decision tree;
  
  constructing an eigenspace by using said set of decision trees to generate supervectors that are subsequently transformed through dimensionality reduction.
- View Dependent Claims (8)
- - 8. The method of claim 7, wherein the yes-no questions include at least one eigen dimension question.

9. A method of constructing a decision tree for storing context-dependent models for automatic speech recognition, comprising:
- providing a pool of yes-no questions to identify different contexts of sound units;
  
  providing a corpus of test speaker data;
  
  for a plurality of test speakers represented by said corpus and for a plurality of questions in said pool, iteratively performing the following steps (a) through (e) inclusive;
  
  (a) selecting a question from said pool;
  
  (b) constructing a first yes model and a first no model for said selected question using speaker data from a first one of said test speakers;
  
  (c) computing a first product of the probability scores for said first yes model and said first no model;
  
  (d) constructing a second yes model and a second no model for said selected question using speaker data from a second one of said test speakers;
  
  (d) computing a second product of the probability scores for said second yes model and said second no model;
  
  (e) computing an overall score for said selected question by computing an overall product that includes the product of said first and second products;
  
  growing a decision tree having nodes populated with different questions selected from the pool such that at each node the question with the highest overall score is used.

10. A memory for storing data for access by an application program being executed on a data processing system, whereby a decision tree for storing speech models is stored, and wherein the decision tree comprises:
- a root node containing a question about a context of a phoneme;
  
  a plurality of non-leaf child nodes containing additional questions, wherein the additional questions include at least one eigen dimension question; and
  
  a plurality of leaf child nodes containing speech models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Junqua, Jean-Claude, Contolini, Matteo, Kuhn, Roland
Primary Examiner(s)
To, Doris H.
Assistant Examiner(s)
Nolan, Daniel A.

Application Number

US09/450,392
Time in Patent Office

1,275 Days
Field of Search

704/250, 704/246, 704/245, 704/240, 704/231, 704/232, 704/236, 704/222, 704/257, 704/255, 704/203
US Class Current

704/250
CPC Class Codes

G10L 15/07 to the speaker

Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Context-dependent acoustic models for medium and large vocabulary speech recognition with eigenvoice training

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links