GENERATING ACOUSTIC MODELS

US 20120278061A1
Filed: 07/10/2012
Published: 11/01/2012
Est. Priority Date: 11/08/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving, at a computer system, a request to generate or modify a target acoustic model for a target language;

accessing, by the computer system, a source acoustic model for a source language, wherein the source acoustic model includes information that maps acoustic features of the source language to phonemes in a transformed feature space;

aligning, using the source acoustic model in the transformed feature space, untransformed voice data in the target language with phonemes in a corresponding textual transcript to obtain aligned voice data, wherein the untransformed voice data is in an untransformed feature space;

transforming the aligned voice data according to a particular transform operation using the source acoustic model to obtain transformed voice data;

adapting the source acoustic model to the target language using the untransformed voice data in the target language to obtain an adapted acoustic model; and

training, by the computer system, a target acoustic model for the target language using the transformed voice data and the adapted acoustic model; and

providing the target acoustic model in association with the target language.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This document describes methods, systems, techniques, and computer program products for generating and/or modifying acoustic models. Acoustic models and/or transformations for a target language/dialect can be generated and/or modified using acoustic models and/or transformations from a source language/dialect.

29 Citations

20 Claims

1. A computer-implemented method comprising:
- receiving, at a computer system, a request to generate or modify a target acoustic model for a target language;
  
  accessing, by the computer system, a source acoustic model for a source language, wherein the source acoustic model includes information that maps acoustic features of the source language to phonemes in a transformed feature space;
  
  aligning, using the source acoustic model in the transformed feature space, untransformed voice data in the target language with phonemes in a corresponding textual transcript to obtain aligned voice data, wherein the untransformed voice data is in an untransformed feature space;
  
  transforming the aligned voice data according to a particular transform operation using the source acoustic model to obtain transformed voice data;
  
  adapting the source acoustic model to the target language using the untransformed voice data in the target language to obtain an adapted acoustic model; and
  
  training, by the computer system, a target acoustic model for the target language using the transformed voice data and the adapted acoustic model; and
  
  providing the target acoustic model in association with the target language.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The computer-implemented method of claim 1, wherein the transformed feature space of the source acoustic model is a Constrained Maximum Likelihood Linear Regression (CMLLR) feature space that is generated from a CMLLR transform operation.
  - 3. The computer-implemented method of claim 1, wherein the source acoustic model is generated from performance of a Linear Discriminant Analysis (LDA) transform operation, Vocal Tract Length Normalization (VTLN) transform operation, and a CMLLR transform operation on training data in the source language, wherein the training data comprises voice data in the source language and corresponding textual transcripts.
  - 4. The computer-implemented method of claim 1, wherein the particular transform operation comprises a VTLN transform operation that is performed on the aligned voice data using the source acoustic model.
  - 5. The computer-implemented method of claim 1, wherein the source acoustic model is adapted to the target language by performing a maximum a posteriori (MAP) adaptation operation on the source acoustic model using the untransformed voice data in the target language.
  - 6. The computer-implemented method of claim 1, wherein training the target acoustic model comprises performing a CMLLR transform operation and a maximum mutual information (MMI) transform operation using the transformed voice data and the adapted acoustic model.
  - 7. The computer-implemented method of claim 1, wherein the target language and the source language comprise different dialects of a common language.
  - 8. The computer-implemented method of claim 1, wherein the target language and the source language comprise different languages.

9. A system comprising:
- a computer system;
  
  an interface of the computer system to receive a request to generate or modify a target acoustic model for a target language;
  
  an acoustic model repository of the computer system to provide access to a source acoustic model for a source language, wherein the source acoustic model includes information that maps acoustic features of the source language to phonemes in a transformed feature space;
  
  an alignment component of the computer system to use the source acoustic model in the transformed feature space to align untransformed voice data in the target language with phonemes in a corresponding textual transcript to obtain aligned voice data, wherein the untransformed voice data is in an untransformed feature space; and
  
  a target model generator of the computer system to i) transform the aligned voice data according to a particular transform operation using the source acoustic model to obtain transformed voice data, ii) adapt the source acoustic model to the target language using the untransformed voice data in the target language to obtain an adapted acoustic model; and
  
  iii) train a target acoustic model for the target language using the transformed voice data and the adapted acoustic model;
  
  wherein the interface is further configured to provide access to the target acoustic model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the transformed feature space of the source acoustic model is a Constrained Maximum Likelihood Linear Regression (CMLLR) feature space that is generated from a CMLLR transform operation.
  - 11. The system of claim 9, further comprising a source model generator of the computer system to generate the source acoustic model from performance of a Linear Discriminant Analysis (LDA) transform operation, Vocal Tract Length Normalization (VTLN) transform operation, and a CMLLR transform operation on training data in the source language, wherein the training data comprises voice data in the source language and corresponding textual transcripts.
  - 12. The system of claim 9, wherein the particular transform operation comprises a VTLN transform operation that is performed on the aligned voice data using the source acoustic model.
  - 13. The system of claim 9, wherein the source acoustic model is adapted to the target language by performing a maximum a posteriori (MAP) adaptation operation on the source acoustic model using the untransformed voice data in the target language.
  - 14. The system of claim 9, wherein training the target acoustic model comprises performing a CMLLR transform operation and a maximum mutual information (MMI) transform operation using the transformed voice data and the adapted acoustic model.
  - 15. The system of claim 9, wherein the target language and the source language comprise different dialects of a common language.
  - 16. The system of claim 9, wherein the target language and the source language comprise different languages.

17. A system comprising:
- a computer system;
  
  an interface of the computer system to receive a request to generate or modify a target acoustic model for a target language;
  
  an acoustic model repository of the computer system to provide access to a source acoustic model for a source language, wherein the source acoustic model includes information that maps acoustic features of the source language to phonemes in a transformed feature space;
  
  an alignment component of the computer system to use the source acoustic model in the transformed feature space to align untransformed voice data in the target language with phonemes in a corresponding textual transcript to obtain aligned voice data, wherein the untransformed voice data is in an untransformed feature space; and
  
  means for generating a target acoustic model for a target language from using the source acoustic model in the transformed feature space and the aligned voice data in the untransformed feature space;
  
  wherein the interface is further configured to provide access to the target acoustic model.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the transformed feature space of the source acoustic model is a Constrained Maximum Likelihood Linear Regression (CMLLR) feature space that is generated from a CMLLR transform operation.
  - 19. The system of claim 17, further comprising a source model generator of the computer system to generate the source acoustic model from performance of a Linear Discriminant Analysis (LDA) transform operation, Vocal Tract Length Normalization (VTLN) transform operation, and a CMLLR transform operation on training data in the source language, wherein the training data comprises voice data in the source language and corresponding textual transcripts.
  - 20. The system of claim 17, wherein the means for generating the target acoustic model uses a VTLN transform operation, a posteriori (MAP) adaptation operation, a CMLLR transform operation, and a maximum mutual information (MMI) transform operation using the source acoustic model and the aligned voice data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Weinstein, Eugene, Moreno Mengibar, Pedro J.

Granted Patent

US 8,374,866 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/2
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/06   Creation of reference templ...

G10L 15/063   Training

G10L 15/065   Adaptation

GENERATING ACOUSTIC MODELS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

GENERATING ACOUSTIC MODELS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others