Speech recognition method

US 20020120444A1
Filed: 09/24/2001
Published: 08/29/2002
Est. Priority Date: 09/27/2000
Status: Active Grant

First Claim

Patent Images

1. A speech recognition method in which a basic set of models, which comprises models for various acoustic units, while the models are described by a plurality of model parameters, is adapted to a current speaker based on already observed speech data of this speaker, characterized in that the basic set of models is represented by a supervector in a high-dimensional vector space (model space), where the supervector is formed by concatenation of the plurality of model parameters of the models of the basic set of models, and this basic set of models in the model space is adapted to the speakers by means of a MAP method, while an asymmetrical distribution in the model space is chosen as an a priori distribution for the MAP method.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method is described in which a basic set of models is adapted to a current speaker on account of the speaker'"'"'s already noticed speech data. The basic set of models comprises models for different acoustic units. The models are described each by a plurality of model parameters. The basic set of models is then represented by a supervector in a high-dimensional vector space (model space), the supervector being formed by a concatenation of the plurality of the model parameters of the models of the basic set of models. The adaptation of this basic set of models to the speaker is effected in the model space by means of a MAP method in which an asymmetric distribution in the model space is selected as an a priori distribution for the MAP method.

17 Citations

View as Search Results

13 Claims

1. A speech recognition method in which a basic set of models, which comprises models for various acoustic units, while the models are described by a plurality of model parameters, is adapted to a current speaker based on already observed speech data of this speaker, characterized in that the basic set of models is represented by a supervector in a high-dimensional vector space (model space), where the supervector is formed by concatenation of the plurality of model parameters of the models of the basic set of models, and this basic set of models in the model space is adapted to the speakers by means of a MAP method, while an asymmetrical distribution in the model space is chosen as an a priori distribution for the MAP method.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A method as claimed in claim 1, characterized in that the a priori distribution is an asymmetrical Gaussian distribution.
  - 3. A method as claimed in claim 1 or 2 characterized in that the a priori distribution is chosen so that an adaptation in certain preferred directions in the model space takes place faster than perpendicularly to these preferred directions.
  - 4. A method as claimed in claim 3, characterized in that the a priori distribution i n the direction of preferred directions and perpendicularly thereto has different variances.
  - 5. A method as claimed in one of the preceding claims, characterized in that the preferred directions are chosen such that they represent the main directions within the model space along which the different speakers can be distinguished from each other.
  - 6. A method as claimed in claim 5, characterized in that the preferred directions run along certain eigenspace basis vectors (E_e) of an eigenspace, which was determined based on training speech data of a plurality of training speakers.
  - 7. A method as claimed in claim 6, characterized in that the eigenspace is determined in the following steps:
    - development of a common speaker-independent set of models for the training speakers while training speech data of the training speaker are used, adaptation of the speaker-independent set of models to the individual training speaker to develop the speaker-dependent set of models while the respective training speech data of the individual training speakers are used, establishing the assignment of the model parameters of the models (SI) of the speaker-independent set of models to the model parameters of the models (SD) of the speaker-dependent sets of models when the speaker-independent set of models are adapted to the individual training speakers, showing a combined model for each speaker in a high-dimensional vector space by concatenation of a plurality of the model parameters of the models of the sets of models of the individual training speakers to a respective coherent supervector, where the concatenation of the plurality of model parameters of the individual sets of models to the supervectors is effected so that the model parameters of the models (SD) of the speaker-dependent sets of models, which are assigned to the same model parameters of the same model (SI) of the speaker-independent set of models are arranged at the respective positions of the respective supervectors, performing a change of basis to reduce the model space to a speaker sub-space in which all the training speakers are represented, performing a transformation of the vectors representing the training speakers in the speaker sub-space to gain eigenspace basis vectors (E_e) while the transformation utilizes a variability of the reduction criterion based on the vectors to be transformed.
  - 8. A method as claimed in claim 7, characterized in that the basis of this speaker sub-space of orthogonalized difference vectors of the supervectors of the individual training speakers is spread out to a mean supervector.
  - 9. A method as claimed in one of the claims 1 to 8, characterized in that a set of mean models of speaker-dependent sets of models of the training speakers is used as a basic set of models.
  - 10. A method as claimed in one of the claims 6 to 9, characterized in that associated order attributes are determined for the eigenspace basis vectors ( E_e).
  - 11. A method as claimed in claim 9 or 10, characterized in that the eigenspace basis vectors are the eigenvectors of a correlation matrix determined by the supervectors and the order attributes are the eigenvalues belonging to the eigenvectors.
  - 12. A computer program with program code means for carrying out all the steps of a method as claimed in one of the preceding claims when the program is executed on a computer.
  - 13. A computer program with program code means as claimed in claim 12 which are stored on a computer-readable data carrier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Botterweck, Henrik

Granted Patent

US 6,917,919 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G06F 18/2135   based on approximation crit...

G06F 18/24155   Bayesian classification

G10L 15/07   to the speaker

Speech recognition method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

17 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others