Synthesis by generation and concatenation of multi-form segments

US 8,321,222 B2
Filed: 08/14/2007
Issued: 11/27/2012
Est. Priority Date: 08/14/2007
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis system implemented using at least one hardware implemented processor, the system comprising:

a speech segment database referencing speech segments having a plurality of different types of speech representational structures including;

i. statistical state model based speech signals, andii. template based speech signals;

a speech segment selector for selecting from the speech segment database a sequence of statistical state model based and template based speech segment candidates corresponding to a target text;

a speech segment sequencer for generating from the speech segment candidates sequenced statistical state model based and template based speech segments corresponding to the target text; and

a speech segment synthesizer for combining the sequenced statistical state model based and template based speech segments to produce a synthesized speech signal output corresponding to the target text.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.

Citations

38 Claims

1. A speech synthesis system implemented using at least one hardware implemented processor, the system comprising:
- a speech segment database referencing speech segments having a plurality of different types of speech representational structures including;
  
  i. statistical state model based speech signals, andii. template based speech signals;
  
  a speech segment selector for selecting from the speech segment database a sequence of statistical state model based and template based speech segment candidates corresponding to a target text;
  
  a speech segment sequencer for generating from the speech segment candidates sequenced statistical state model based and template based speech segments corresponding to the target text; and
  
  a speech segment synthesizer for combining the sequenced statistical state model based and template based speech segments to produce a synthesized speech signal output corresponding to the target text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. A speech synthesis system according to claim 1, wherein the different types of speech representational structures include statistical state model-based speech signals augmented with template information.
  - 3. A speech synthesis system according to claim 1, wherein the speech segment selector uses a statistical state model for selecting the speech segment candidates.
  - 4. A speech synthesis system according to claim 3, wherein the speech segment selector uses template information to augment the statistical state model.
  - 5. A speech synthesis system according to claim 1, wherein the different types of speech representational structures share at least one parameter component.
  - 6. A speech synthesis system according to claim 5, wherein the shared parameter component is encoded differently in different speech representational structures.
  - 7. A speech synthesis system according to claim 1, wherein the speech segment sequencer uses static observations in generating the sequenced speech segments.
  - 8. A speech synthesis system according to claim 1, wherein the speech segment sequencer uses observations from the speech segment selector in generating the sequenced speech segments.
  - 9. A speech synthesis system according to claim 1, wherein the speech segment sequencer uses static observations and observations from the speech segment selector in generating the sequenced speech segments.
  - 10. A speech synthesis system according to claim 1, wherein the speech segment selector uses statistically derived cost-functions for selecting the speech segment candidates.
  - 11. A speech synthesis system according to claim 1, wherein the speech segment sequencer uses statistically derived cost-functions for generating the sequenced speech segments.
  - 12. A speech synthesis system according to claim 1, wherein the speech segment selector uses empirical rules for selecting the speech segment candidates.
  - 13. A speech synthesis system according to claim 1, wherein the speech segment sequencer uses empirical rules for generating the sequenced speech segments.
  - 14. A speech synthesis system according to claim 1, wherein the speech segment selector uses psycho-acoustic rules for selecting the speech segment candidates.
  - 15. A speech synthesis system according to claim 1, wherein the speech segment sequencer uses psycho-acoustic rules for generating the sequenced speech segments.
  - 16. A speech synthesis system according to claim 10, wherein the statistically derived cost-functions are based on sequences of speech segment observations.
  - 17. A speech synthesis system according to claim 16, wherein the sequences of speech segment observations are described by a Markov process.
  - 18. A speech synthesis system according to claim 11, wherein the statistically derived cost-functions are based on sequences of speech segment observations.
  - 19. A speech synthesis system according to claim 18, wherein the sequences of speech segment observations are described by a Markov process.

20. A method of speech synthesis comprising:
- with a system implemented using at least one hardware implemented processor;
  
  referencing in a speech segment database speech segments having a plurality of different types of speech representational structures including;
  
  i. statistical state model based speech signals, andii. template based speech signals;
  
  selecting from the speech segment database a sequence of statistical state model based and template based speech segment candidates corresponding to a target text;
  
  generating from the speech segment candidates sequenced statistical state model based and template based speech segments corresponding to the target text; and
  
  combining the sequenced statistical state model based and template based speech segments to produce a synthesized speech signal output corresponding to the target text.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 21. A method according to claim 20, wherein the different types of speech representational structures include statistical state model-based speech signals augmented with template information.
  - 22. A method according to claim 20, wherein a statistical state speech model is used in selecting the sequence of speech segment candidates.
  - 23. A method according to claim 22, wherein the statistical state speech model is augmented by template information.
  - 24. A method according to claim 20, wherein the different types of speech representational structures share at least one parameter component.
  - 25. A method according to claim 24, wherein the shared parameter component is encoded differently in different speech representational structures.
  - 26. A method according to claim 20, wherein static observations are used in generating the sequenced speech segments.
  - 27. A method according to claim 20, wherein observations from the selecting the sequence of speech segment candidates are used in the generating the sequenced speech segments.
  - 28. A method according to claim 20, wherein static observations and observations from the selecting the sequence of speech segment candidates are used in the generating the sequenced speech segments.
  - 29. A method according to claim 20, wherein statistically derived cost-functions are used for selecting the speech segment candidates.
  - 30. A method according to claim 20, wherein statistically derived cost-functions are used for generating the sequenced speech segments.
  - 31. A method according to claim 20, wherein empirical rules are used for selecting the speech segment candidates.
  - 32. A method according to claim 20, wherein empirical rules are used for generating the sequenced speech segments.
  - 33. A method according to claim 20, wherein psycho-acoustic rules are used for selecting the speech segment candidates.
  - 34. A method according to claim 20, wherein psycho-acoustic rules are used for generating the sequenced speech segments.
  - 35. A method according to claim 29, wherein the statistically derived cost-functions are based on sequences of speech segment observations.
  - 36. A method according to claim 35, wherein the sequences of speech segment observations are described by a Markov process.
  - 37. A method according to claim 30, wherein the statistically derived cost-functions are based on sequences of speech segment observations.
  - 38. A method according to claim 37, wherein the sequences of speech segment observations are described by a Markov process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Pollet, Vincent, Breen, Andrew
Primary Examiner(s)
Colucci, Michael

Application Number

US11/838,609
Publication Number

US 20090048841A1
Time in Patent Office

1,932 Days
Field of Search

704/260, 704/256.2, 704/233, 704/208, 704/258, 704/268, 704/269, 704/277, 704/9, 715/264, 379/88.03, 382/187
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 15/142 Hidden Markov Models [HMMs]

Synthesis by generation and concatenation of multi-form segments

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Synthesis by generation and concatenation of multi-form segments

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links