Cross-lingual speaker adaptation for multi-lingual speech synthesis

US 9,922,641 B1
Filed: 10/31/2012
Issued: 03/20/2018
Est. Priority Date: 10/01/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving input speech data from a speaker in a first language;

estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data;

accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language;

modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and

generating speech data in the second language using the speaker-specific speech model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language. The method further includes modifying the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generating speech data in the second language using the speaker-specific speech model.

50 Citations

View as Search Results

26 Claims

1. A method comprising:
- receiving input speech data from a speaker in a first language;
  
  estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data;
  
  accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language;
  
  modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and
  
  generating speech data in the second language using the speaker-specific speech model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages.
  - 3. The method of claim 2, wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers.
  - 4. The method of claim 1, wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs).
  - 5. The method of claim 4 further comprising training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker.
  - 6. The method of claim 5, further comprising estimating the second speaker transform from the speech data of the second speaker.
  - 7. The method of claim 1, wherein generating the speech in the second language comprises:
    - generating transcription data from the input speech data;
      
      translating the transcription data from the first language to the second language; and
      
      generating the speech based on the translated data.
  - 8. The method of claim 1, wherein generating the speech in the second language comprises:
    - accessing text data in the second language; and
      
      generating the speech based on the accessed text data.

9. A system comprising:
- a speech synthesis engine including a processor, the speech synthesis engine configured to;
  
  receive input speech data from a speaker in a first language,estimate, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data,access a speaker-independent speech model for generating speech data in a second language that is different from the first language,modify the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, andgenerate speech data in the second language using the speaker-specific speech model.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages.
  - 11. The system of claim 10, comprising a training engine configured to estimate a plurality of speech parameters of the universal speech model, based on speech from the plurality of speakers.
  - 12. The system of claim 10, wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs).
  - 13. The system of claim 12 comprising a training engine configured to train the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker.
  - 14. The system of claim 13, wherein the training engine is configured to estimate the second speaker transform from the speech data of the second speaker.
  - 15. The system of claim 9 comprising:
    - a speech recognition engine configured to generate transcription data from the input speech data; and
      
      a translation engine configured to translate the transcription data from the first language to the second language, and provide the translated data to the speech synthesis engine for generating the speech data in the second language.
  - 16. The system of claim 9 wherein the speech synthesis engine is configured to access text data in second language, and generate the speech based on the accessed speech data.

17. A computer program product comprising computer readable instructions encoded on a storage device, the instructions configured to cause one or more processors to:
- receive input speech data from a speaker in a first language,estimate, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data,access a speaker-independent speech model for generating speech data in a second language that is different from the first language,modify cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model, andgenerate speech data in the second language using the speaker-specific speech model.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The computer program product of claim 17, wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages.
  - 19. The computer program product of claim 18, wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers.
  - 20. The computer program product of claim 17, wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs).
  - 21. The computer program product of claim 20, wherein the computer readable instructions include instructions for training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker.
  - 22. The computer program product of claim 21, wherein the computer readable instructions include instructions for estimating the second speaker transform from the speech data of the second speaker.
  - 23. The computer program product of claim 17, wherein the computer readable instructions include instructions for:
    - generating transcription data from the input speech data;
      
      translating the transcription data from the first language to the second language; and
      
      generating the speech based on the translated data.
  - 24. The computer program product of claim 17, wherein the computer readable instructions includes instructions for:
    - accessing text data in the second language; and
      
      generating the speech based on the accessed text data.

25. A method comprising:
- receiving input speech data from a speaker in a first language;
  
  estimating, by a processor, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, wherein the speaker transform is one of a linear transform and a non-linear transform;
  
  accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language;
  
  modifying the speaker-independent speech model using the estimated speaker transform coefficients to obtain a speaker-specific speech model; and
  
  generating speech data in the second language using the speaker-specific speech model.
- View Dependent Claims (26)
- - 26. The method of claim 25, wherein the speaker specific speech model includes a set of adapted coefficients obtained by applying the speaker transform to a set of unadapted coefficients from the speaker-independent speech model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Chun, Byung Ha
Primary Examiner(s)
Ortiz Sanchez, Michael

Application Number

US13/665,390
Time in Patent Office

1,966 Days
Field of Search

704 2, 704 3, 704 4, 704258, 704261, 704266
US Class Current
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 15/02   Feature extraction for spee...

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2021/0135   Voice conversion or morphing

Cross-lingual speaker adaptation for multi-lingual speech synthesis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

50 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Cross-lingual speaker adaptation for multi-lingual speech synthesis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

50 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links