System and method for unit selection text-to-speech using a modified Viterbi approach

US 10,636,412 B2
Filed: 09/17/2018
Issued: 04/28/2020
Est. Priority Date: 06/18/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

in a text-to-speech synthesis system that uses unit selection, imposing ordering constraints on speech units stored in the text-to-speech synthesis system, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch;

selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and

synthesizing speech using the selected speech units.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

30 Citations

20 Claims

1. A method comprising:
- in a text-to-speech synthesis system that uses unit selection, imposing ordering constraints on speech units stored in the text-to-speech synthesis system, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch;
  
  selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and
  
  synthesizing speech using the selected speech units.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the respective first pitch and the respective second pitch comprise a respective leading edge frequency of the respective first speech unit and the respective second speech unit.
  - 3. The method of claim 1, wherein the respective first pitch and the respective second pitch comprise a trailing edge frequency of the respective first speech unit and the respective second speech unit that is within the threshold value.
  - 4. The method of claim 1, further comprising adjusting the threshold value based on a number of the selected speech units.
  - 5. The method of claim 4, wherein the threshold value is decreased when more units are selected and increases when fewer units are selected.
  - 6. The method of claim 1, further comprising assigning a pitch to speech units in the text-to-speech synthesis system which do not have an assigned pitch.
  - 7. The method of claim 1, wherein the respective first pitch and the respective second pitch are each a dominant one of multiple factors by which the speech units are ordered according to the ordering constraints.

8. A text-to-speech system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  imposing ordering constraints on speech units stored in the text-to-speech system, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch;
  
  selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and
  
  synthesizing speech using the selected speech units.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The text-to-speech system of claim 8, wherein the respective first pitch and the respective second pitch comprise a respective leading edge frequency of the respective first speech unit and the respective second speech unit.
  - 10. The text-to-speech system of claim 8, wherein the respective first pitch and the respective second pitch comprise a trailing edge frequency of the respective first speech unit and the respective second speech unit that is within the threshold value.
  - 11. The text-to-speech system of claim 8, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform operations further comprising:
    - adjusting the threshold value based on a number of the selected speech units.
  - 12. The text-to-speech system of claim 11, wherein the threshold value is decreased when more units are selected and increases when fewer units are selected.
  - 13. The text-to-speech system of claim 8, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform operations further comprising:
    - assigning a pitch to speech units in the text-to-speech system which do not have an assigned pitch.
  - 14. The text-to-speech system of claim 8, wherein the respective first pitch and the respective second pitch are each a dominant one of multiple factors by which the speech units are ordered according to the ordering constraints.

15. A computer-readable storage device having instructions stored which, when executed by a text-to-speech synthesis system, cause the text-to-speech synthesis system to perform operations comprising:
- imposing ordering constraints on speech units, the ordering constraints indicating speech unit pairs, each respective speech unit pair of the speech units pairs having a respective first speech unit with a respective first pitch and a respective second speech unit having a respective second pitch, the speech unit pairs being suitable for concatenation based on the respective first pitch and the respective second pitch;
  
  selecting, from the speech units and based at least in part on a difference in pitch between the respective first pitch and the respective second pitch being below a threshold value according to the ordering constraints, units for speech synthesis to yield selected speech units; and
  
  synthesizing speech using the selected speech units.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage device of claim 15, wherein the respective first pitch and the respective second pitch comprise a respective leading edge frequency of the respective first speech unit and the respective second speech unit.
  - 17. The computer-readable storage device of claim 15, wherein the respective first pitch and the respective second pitch comprise a trailing edge frequency of the respective first speech unit and the respective second speech unit that is within the threshold value.
  - 18. The computer-readable storage device of claim 15, wherein the computer-readable storage device stores further instructions which, when executed by the text-to-speech synthesis system, cause the text-to-speech synthesis system to perform further operations comprising:
    - adjusting the threshold value based on a number of the selected speech units.
  - 19. The computer-readable storage device of claim 18, wherein the threshold value is decreased when more units are selected and increases when fewer units are selected.
  - 20. The computer-readable storage device of claim 15, further comprising assigning a pitch to speech units in the text-to-speech synthesis system which do not have an assigned pitch.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Cerence Operating Company (Cerence Inc.)
Inventors
Conkie, Alistair D.
Primary Examiner(s)
Wozniak, James S

Application Number

US16/133,156
Publication Number

US 20190019496A1
Time in Patent Office

589 Days
Field of Search

704258, 704E13009, 704E1301
US Class Current
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/04   Details of speech synthesis...

G10L 13/06   Elementary speech units use...

G10L 13/07   Concatenation rules

System and method for unit selection text-to-speech using a modified Viterbi approach

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

30 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for unit selection text-to-speech using a modified Viterbi approach

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links