Block-diagonal covariance joint subspace tying and model compensation for noise robust automatic speech recognition

US 7,729,909 B2
Filed: 03/06/2006
Issued: 06/01/2010
Est. Priority Date: 03/04/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A noise robust automatic speech recognition system, comprising:

a front end analysis module isolating a set of independent subspaces, wherein said front end analysis module employs one or more block diagonal front-end whitening matrices to isolate the set of independent subspaces;

a model-compensation module employing a model-compensation distortion function that operates on each of the subspaces isolated by said front-end analysis module; and

a subspace model compression module employing subspace tying to perform model compression.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Model compression is combined with model compensation. Model compression is needed in embedded ASR to reduce the size and the computational complexity of compressed models. Model-compensation is used to adapt in real-time to changing noise environments. The present invention allows for the design of smaller ASR engines (memory consumption reduced to up to one-sixth) with reduced impact on recognition accuracy and/or robustness to noises.

45 Citations

View as Search Results

29 Claims

1. A noise robust automatic speech recognition system, comprising:
- a front end analysis module isolating a set of independent subspaces, wherein said front end analysis module employs one or more block diagonal front-end whitening matrices to isolate the set of independent subspaces;
  
  a model-compensation module employing a model-compensation distortion function that operates on each of the subspaces isolated by said front-end analysis module; and
  
  a subspace model compression module employing subspace tying to perform model compression.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The system of claim 1, wherein the independent subspaces span over different time frames, and the whitening matrices include decorrelation across a 2-dimensional time-frequency axis.
  - 3. The system of claim 2, wherein such 2-D decorrelation matrices are decomposable as discrete cosine transform in frequency domain and time derivative in time domain.
  - 4. The system of claim 1, wherein subspaces corresponding to the block-diagonal whitening matrices in said front end analysis module are constrained to be large enough to allow sufficiently good coverage of a speech signal correlation structure in said front-end analysis module and in said model-compensation module, but small enough to allow a sufficiently low distortion error from subspace tying performed by said subspace model compression module.
  - 5. The system of claim 4, wherein a subspace definition according to size constraints of the subspaces is accomplished by use of an interactive converging algorithm seeking one or more subspace definition solutions that approach optimal combinations of good coverage and low distortion.
  - 6. The system of claim 1, wherein said front end analysis module employs an interactive converging algorithm to determine size constraints of subspaces.
  - 7. The system of claim 6, wherein subspaces in the front-end analysis module are constrained to be large enough to allow coverage of a speech signal correlation structure in said front-end analysis module and in said model-compensation module.
  - 8. The system of claim 6, wherein subspaces in the front-end analysis module are constrained to be small enough to allow a low distortion error from subspace tying.
  - 9. The system of claim 1, wherein all components in said front-end analysis module, model-compensation module, and subspace model compression module are split and aligned to follow a subspace definition structure.
  - 10. The system of claim 9, wherein decorrelation matrices of said front end analysis module operate independently on blocks of log filter-bank energies, thereby allowing for the model-compensation to work effectively on each subspace without affecting the subspace tying structure.
  - 11. The system of claim 1, wherein subspaces in the front-end analysis module are constrained to be large enough to allow coverage of a speech signal correlation structure in said front-end analysis module and in said model-compensation module.
  - 12. The system of claim 1, wherein subspaces in the front-end analysis module are constrained to be small enough to allow a low distortion error from subspace tying.
  - 13. The system of claim 1, wherein subspaces used for the tying are aligned with the independent subspaces isolated by said front end analysis module.
  - 14. The system of claim 1 wherein the model-compensation distortion function is alpha-Jacobian model compensation.
  - 15. The system of claim 1 further comprising a plurality of model-compensation distortion functions, wherein each of the subspaces isolated by said front-end analysis module is operated on by one of the plurality of model compensation distortion functions.

16. A method of operation for use with a noise robust automatic speech recognition system, comprising:
- isolating a set of independent subspaces using a block diagonal front-end whitening matrix;
  
  using a model compensation module of the speech recognition system that implements a model-compensation distortion function that operates on each of the isolated subspaces; and
  
  employing subspace tying to perform model compression.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 17. The method of claim 16, further comprising constraining subspaces corresponding to the block-diagonal whitening matrix to be large enough to allow sufficiently good coverage of a speech signal correlation structure, but small enough to allow a sufficiently low distortion error from subspace tying.
  - 18. The method of claim 17, further comprising using an interactive converging algorithm to seek one or more subspace definition solutions that approach optimal combinations of good coverage and low distortion in order to accomplish subspace definition according to size constraints of the subspaces.
  - 19. The method of claim 16, further comprising employing an interactive converging algorithm to determine size constraints of subspaces.
  - 20. The method of claim 19, further comprising constraining the subspaces to be large enough to allow coverage of a speech signal correlation structure.
  - 21. The method of claim 19, further comprising constraining the subspaces to be small enough to allow a low distortion error from subspace tying.
  - 22. The method of claim 16, employing front end analysis processes, model-compensation processes, and subspace model compression processes that are split and aligned to follow a subspace definition structure.
  - 23. The method of claim 22, further comprising employing decorrelation matrices that operate independently on blocks of log filter-bank energies, thereby allowing for the model-compensation to work effectively on each subspace without affecting the subspace tying structure.
  - 24. The method of claim 16, further comprising constraining subspaces to be large enough to allow coverage of a speech signal correlation structure.
  - 25. The method of claim 16, further comprising constraining subspaces in the front-end analysis module to be small enough to allow a low distortion error from subspace tying.
  - 26. The method of claim 16, further comprising aligning subspaces used for the tying with the independent subspaces.
  - 27. The method of claim 16, further comprising employing an additional subspace tying regarding compensated acoustic models to perform speaker adaptation.
  - 28. The system of claim 16, wherein the model-compensation distortion function is alpha-Jacobian model compensation.
  - 29. The method of claim 16 further employing a plurality of model-compensation distortion functions, wherein each of the isolated subspaces is operated on by one of the plurality of model compensation distortion functions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Morii, Keiko, Kunieda, Nobuyuki, Rigazio, Luca, Junqua, Jean-Claude, Kryze, David
Primary Examiner(s)
Lerner; Martin

Application Number

US11/369,938
Publication Number

US 20070208560A1
Time in Patent Office

1,548 Days
Field of Search

704/233, 704/243, 704/244, 704/245, 704/256.2, 704/256.3
US Class Current

704/233
CPC Class Codes

G10L 15/065 Adaptation

G10L 15/20 Speech recognition techniqu...

Block-diagonal covariance joint subspace tying and model compensation for noise robust automatic speech recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Block-diagonal covariance joint subspace tying and model compensation for noise robust automatic speech recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links