SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH

US 20110313772A1
Filed: 06/18/2010
Published: 12/22/2011
Est. Priority Date: 06/18/2010
Status: Active Grant

First Claim

Patent Images

1. A system for speech synthesis, the system comprising:

a processor;

a first module controlling the processor to receive a set of ordered lists of speech units;

a second module controlling the processor, for each respective speech unit in each ordered list in the set of ordered lists, to construct a sublist of speech units from a next ordered list which are suitable for concatenation;

a third module controlling the processor to perform a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit; and

a fourth module controlling the processor to synthesize speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

14 Citations

View as Search Results

18 Claims

1. A system for speech synthesis, the system comprising:
- a processor;
  
  a first module controlling the processor to receive a set of ordered lists of speech units;
  
  a second module controlling the processor, for each respective speech unit in each ordered list in the set of ordered lists, to construct a sublist of speech units from a next ordered list which are suitable for concatenation;
  
  a third module controlling the processor to perform a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit; and
  
  a fourth module controlling the processor to synthesize speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1, wherein the set of ordered lists of speech units are ordered by speech unit pitch.
  - 3. The system of claim 2, wherein speech unit pitch is a dominant one of multiple factors by which the lists of speech units are ordered.
  - 4. The system of claim 1, further comprising assigning a pitch to units which do not have an assigned pitch.
  - 5. The system of claim 1, further comprising dynamically adjusting a threshold value which determines suitability for concatenation.
  - 6. The system of claim 1, wherein speech is synthesized by concatenating speech units associated with the lowest cost path.

7. A method of speech synthesis, the method comprising:
- in a text-to-speech synthesis system that uses unit selection, imposing ordering constraints on speech units, the ordering constraints indicating speech unit pairs which are suitable for concatenation based on a respective pitch of each speech unit; and
  
  when performing unit selection to synthesize speech, considering speech unit pairs in which a difference in pitch is below a threshold value based on the imposed ordering constraints.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, the method further comprising generating two ordered lists of speech units based on the respective pitch of each speech unit.
  - 9. The method of claim 8, wherein the respective pitch is a dominant one of multiple factors by which the lists of speech units are ordered.
  - 10. The method of claim 7, further comprising assigning a pitch to units which do not have an assigned pitch.
  - 11. The method of claim 7, further comprising dynamically adjusting the threshold value.
  - 12. The method of claim 7, wherein the speech unit pairs correspond to a first position and a second position which are concatenated together with other speech units to form synthesized speech.

13. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform speech synthesis, the instructions comprising:
- receiving a set of ordered lists of speech units;
  
  for each respective speech unit in each ordered list in the set of ordered lists, constructing a sublist of speech units from a next ordered list which are suitable for concatenation;
  
  performing a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit; and
  
  synthesizing speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The non-transitory computer-readable storage medium of claim 13, wherein the set of ordered lists of speech units are ordered by speech unit pitch.
  - 15. The non-transitory computer-readable storage medium of claim 14, wherein speech unit pitch is a dominant one of multiple factors by which the lists of speech units are ordered.
  - 16. The non-transitory computer-readable storage medium of claim 13, further comprising assigning a pitch to units which do not have an assigned pitch.
  - 17. The non-transitory computer-readable storage medium of claim 13, further comprising dynamically adjusting a threshold value which determines suitability for concatenation.
  - 18. The non-transitory computer-readable storage medium of claim 13, wherein speech is synthesized by concatenating speech units associated with the lowest cost path.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
CONKIE, Alistair D.

Granted Patent

US 8,731,931 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/04   Details of speech synthesis...

G10L 13/06   Elementary speech units use...

G10L 13/07   Concatenation rules

SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links