System and method for unit selection text-to-speech using a modified Viterbi approach

US 10,079,011 B2
Filed: 05/20/2014
Issued: 09/18/2018
Est. Priority Date: 06/18/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

selecting candidate speech units for converting text to speech;

ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units;

constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units;

concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and

synthesizing the speech using the concatenated speech unit.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

Citations

20 Claims

1. A method comprising:
- selecting candidate speech units for converting text to speech;
  
  ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units;
  
  constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units;
  
  concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and
  
  synthesizing the speech using the concatenated speech unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit.
  - 3. The method of claim 1, wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit.
  - 4. The method of claim 1, further comprising adjusting the threshold distance based on a number of candidate speech units selected.
  - 5. The method of claim 4, wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected.
  - 6. The method of claim 1, further comprising assigning a pitch to units which do not have an assigned pitch.
  - 7. The method of claim 1, wherein respective fundamental frequency is a dominant one of multiple factors by which the ordered candidate speech units are ordered.

8. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  selecting candidate speech units for converting text to speech;
  
  ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units;
  
  constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units;
  
  concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and
  
  synthesizing the speech using the concatenated speech unit.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit.
  - 10. The system of claim 8, wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit.
  - 11. The system of claim 8, the computer-readable storage medium having additional instructions stored which result in operations comprising adjusting the threshold distance based on a number of candidate speech units selected.
  - 12. The system of claim 11, wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected.
  - 13. The system of claim 8, the computer-readable storage medium having additional instructions stored which result in operations comprising assigning a pitch to units which do not have an assigned pitch.
  - 14. The system of claim 8, wherein respective fundamental frequency is a dominant one of multiple factors by which the ordered candidate speech units are ordered.

15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- selecting candidate speech units for converting text to speech;
  
  ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units;
  
  constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units;
  
  concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and
  
  synthesizing the speech using the concatenated speech unit.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage device of claim 15, wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit.
  - 17. The computer-readable storage device of claim 15, wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit.
  - 18. The computer-readable storage device of claim 15, having additional instructions stored which result in operations comprising adjusting the threshold distance based on a number of candidate speech units selected.
  - 19. The computer-readable storage device of claim 18, wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected.
  - 20. The computer-readable storage device of claim 15, having additional instructions stored which result in operations comprising assigning a pitch to units which do not have an assigned pitch.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Conkie, Alistair D.
Primary Examiner(s)
Wozniak, James

Application Number

US14/282,040
Publication Number

US 20140257818A1
Time in Patent Office

1,582 Days
Field of Search

704258, 704260, 704261, 704266, 704E13009, 704E1301
US Class Current
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/04   Details of speech synthesis...

G10L 13/06   Elementary speech units use...

G10L 13/07   Concatenation rules

System and method for unit selection text-to-speech using a modified Viterbi approach

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for unit selection text-to-speech using a modified Viterbi approach

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links