Speech recognition method using speaker cluster models

US 6,567,776 B1
Filed: 04/04/2000
Issued: 05/20/2003
Est. Priority Date: 08/11/1999
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition method comprising:

receiving a speech signal;

recognizing the speech signal using a speaker cluster model obtained in a training phase wherein the speaker cluster model is a collection of a plurality of cluster-dependent models, and a score of each candidate is calculated according to a score function which is defined by taking the dependency among the cluster-dependent models into account; and

obtaining a final recognition result according to a decision rule based on the Score of each candidate, wherein the training phase comprises building an initialization model, and adjusting parameters of at least two cluster-dependent models of the initialization model by using a discriminative training method to obtain the speaker cluster model wherein the discriminative training method is implemented by using a minimum classification error as a training criterion, a discriminant function of the discriminative training method being defined in the same manner as the score function, and the score function is defined as;

$g_{i} (X; Γ) = {\log [\frac{1}{N} \sum_{n = 1}^{N} w_{n} (X) \exp [h_{i} (X; Λ_{n}) ξ]]}^{\frac{1}{ξ}}, i = 1, 2, \dots, M$

wherein g_i(X;

Ã

) is the score function, X is a feature vector sequence of the speech signal, Ã

represents an entire parameter set of the speaker cluster model, N is the number of cluster-dependent models, parameter sets corresponding to the N cluster-dependent models are Ë

₁, Ë

₂, . . . , Ë

_N, M is the number of candidates to be classified, h_i(X;

Ë

_n) is a log-likelihood function defined only on a parameter set Ë

_n, î

is a positive weighting number, and w_n(X) is a cluster weighting function that indicates the degree to which the nth cluster-dependent model is used for recognition.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In speaker-independent speech recognition, between-speaker variability is one of the major resources of recognition errors. A speaker cluster model is used to manage recognition problems caused by between-speaker variability. In the training phase, the score function is used as a discriminative function. The parameters of at least two cluster-dependent models are adjusted through a discriminative training method to improve performance of the speech recognition.

Citations

4 Claims

1. A speech recognition method comprising:
- receiving a speech signal;
  
  recognizing the speech signal using a speaker cluster model obtained in a training phase wherein the speaker cluster model is a collection of a plurality of cluster-dependent models, and a score of each candidate is calculated according to a score function which is defined by taking the dependency among the cluster-dependent models into account; and
  
  obtaining a final recognition result according to a decision rule based on the Score of each candidate, wherein the training phase comprises building an initialization model, and adjusting parameters of at least two cluster-dependent models of the initialization model by using a discriminative training method to obtain the speaker cluster model wherein the discriminative training method is implemented by using a minimum classification error as a training criterion, a discriminant function of the discriminative training method being defined in the same manner as the score function, and the score function is defined as;
  
  $g_{i} (X; Γ) = {\log [\frac{1}{N} \sum_{n = 1}^{N} w_{n} (X) \exp [h_{i} (X; Λ_{n}) ξ]]}^{\frac{1}{ξ}}, i = 1, 2, \dots, M$
  
  wherein g_i(X;
  
  Ã
  
  ) is the score function, X is a feature vector sequence of the speech signal, Ã
  
  represents an entire parameter set of the speaker cluster model, N is the number of cluster-dependent models, parameter sets corresponding to the N cluster-dependent models are Ë
  
  ₁, Ë
  
  ₂, . . . , Ë
  
  _N, M is the number of candidates to be classified, h_i(X;
  
  Ë
  
  _n) is a log-likelihood function defined only on a parameter set Ë
  
  _n, î
  
  is a positive weighting number, and w_n(X) is a cluster weighting function that indicates the degree to which the nth cluster-dependent model is used for recognition.
- View Dependent Claims (2, 3, 4)
- - 2. The speech recognition method of claim 1 wherein the discriminative training method is implemented by a generalized probabilistic descent method.
  - 3. The speech recognition method of claim 1 wherein the decision rule is to selected a candidate with a highest score.
  - 4. The speech recognition method of claim 1 wherein the cluster weighting function w_n(X) is a zero-one function used to indicate whether the nth cluster-dependent model will be used for recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Industrial Technology Research Institute
Original Assignee
Industrial Technology Research Institute
Inventors
Chien, Shih-Chieh, Chang, Sen-Chia, Penwu, Chung-Mou
Primary Examiner(s)
Banks-Harold, Marsha D.
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/542,844
Time in Patent Office

1,141 Days
Field of Search

704/236, 704/240, 704/243, 704/244, 704/245, 704/246, 704/250
US Class Current

704/236
CPC Class Codes

G10L 15/063 Training

G10L 2015/0631 Creating reference template...

Speech recognition method using speaker cluster models

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

4 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method using speaker cluster models

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

4 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links