Method for recognizing speech

US 20020046024A1
Filed: 09/05/2001
Published: 04/18/2002
Est. Priority Date: 09/06/2000
Status: Active Grant

First Claim

Patent Images

1. Method for recognizing speech, wherein for the process of recognition a current acoustic model (CAM) based on a set of model function mixtures (MFMl, . . . , MFMn) is used and wherein said current acoustic model (CAM) is adapted during the recognition process by changing at least in part the contributions of model function mixture components (MFM_jk) of model function mixtures (MFMj) based on at least one recognition result already obtained, characterized in that the process of recognition is started using a starting acoustic model (SAM) as said current acoustic model (CAM), after given numbers of performed recognition steps and/or obtained recognition results a modified acoustic model (MAM) is generated based on said current acoustic model (CAM) by cancelling model function mixture components (MFM_jk) having negligible contributions with respect to at least given numbers of recognition results already obtained, and the process of recognition is continued using said modified acoustic model (MAM) as said current acoustic model (CAM) in each case.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for recognizing speech is proposed wherein the process of recognition is started using the starting acoustic model (SAM) and wherein the current acoustic model (CAM) is modified by removing or cancelling model function mixture components (MFM_jk) which are negligible for the description of the speaking behaviour and quality of the current speaker. Therefore, the size of the acoustic model (SAM, CAM) is reduced by adaptation to the current speaker enabling fast performance and increased recognition efficiency.

13 Citations

View as Search Results

10 Claims

1. Method for recognizing speech, wherein for the process of recognition a current acoustic model (CAM) based on a set of model function mixtures (MFMl, . . . , MFMn) is used and wherein said current acoustic model (CAM) is adapted during the recognition process by changing at least in part the contributions of model function mixture components (MFM_jk) of model function mixtures (MFMj) based on at least one recognition result already obtained, characterized in that the process of recognition is started using a starting acoustic model (SAM) as said current acoustic model (CAM), after given numbers of performed recognition steps and/or obtained recognition results a modified acoustic model (MAM) is generated based on said current acoustic model (CAM) by cancelling model function mixture components (MFM_jk) having negligible contributions with respect to at least given numbers of recognition results already obtained, and the process of recognition is continued using said modified acoustic model (MAM) as said current acoustic model (CAM) in each case.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. Method according to claim 1, wherein a modified acoustic model (MAM) is generated repeatedly after each fixed and/or predetermined number of performed recognition steps and/or obtained recognition results, in particular after each single performed recognition step and/or obtained recognition result.
  - 3. Method according to anyone of the preceding claims, wherein the number of recognition steps and/or recognition results after which a modified acoustic model (MAM) is generated is determined and/or changed within the current process of recognition and/or adaptation.
  - 4. Method according to anyone of the preceding claims, wherein an acoustic model is used—
    - in particular as said starting acoustic model (SAM) and/or as said current acoustic model (CAM) in each case—
      
      the model function mixtures (MFMj) of which at least contain distribution functions or the like, in particular functions of the Gaussian type or the like and/or in particular as said model function mixture components (MFM_jk).
  - 5. Method according to anyone of the preceding claims, wherein each of said model function mixtures (MFMj) is based on a function vector (f_j) and a weight factor vector (a_j), each of which having a finite and/or equal number (n_j) of components.
  - 6. Method according to claim 5, wherein each of said model function mixtures (MFMj) is a linear combination or superposition of its vector function components (f_j,k) weighted by its weight factor vector components (a_j,k), particularly represented by a scalar product of the weight factor vector (a_j) and the function vector (f_j):
    - $MFMj = \sum_{k = 1}^{n_{j}} a_{j, k} f_{j, k} = a_{j}^{T} f_{j} = a_{j} \cdot f_{j}$ where MFMj denotes the j^thmodel function mixture, a_jdenotes a j^thweight factor vector with a_j,kbeing its k^thcomponent, f_jdenotes the j^thfunction vector with f_j,kbeing its k^thcomponent, aj^Tdenotes the transposed form of aj and 
      
      denotes the scalar or inner product of the vectors.
  - 7. Method according to claim 5 or 6, wherein each of said model function mixture components (MFMj,_k) is classified as being negligible if the absolute value (|aj,k|) of its weight factor vector component (a_j,k) is beyond a given threshold value (c_j,k), in particular for a given number (m_j,k) of times of recognition steps already performed and/or recognition results already obtained.
  - 8. Method according to claim 7, wherein each of said threshold values (c_j,k) is predetermined and/or fixed, in particular for each of the model unction mixture components (MFMjk) independently and in particular before starting the recognition process.
  - 9. Method according to claim 7 or 8, wherein each of said threshold value (cj,k) is determined and/or modified during the recognition process, in particular based on signal quality information of the speech input and/or in particular with respect to statistical and/or noise data.
  - 10. Method according to any of the preceding claims, wherein weight factor vector components (a_j,k) are modified among other components of the modified acoustic model (MAM) for speaker adaptation, in particular to reduce certain weight factor vector components (a_j,k) below certain thresholds.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Deutschland GmbH (Sony Group Corp.)
Original Assignee
Sony Deutschland GmbH (Sony Group Corp.)
Inventors
Kompe, Ralf, Goronzy, Silke

Granted Patent

US 6,999,929 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/142   Hidden Markov Models [HMMs]

Method for recognizing speech

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

13 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Method for recognizing speech

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

13 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links