METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS

US 20120136663A1
Filed: 11/29/2011
Published: 05/31/2012
Est. Priority Date: 04/30/1999
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs;

determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and

if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.

8 Citations

20 Claims

1. A method comprising:
- generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs;
  
  determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and
  
  if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20)
- - 2. The method of claim 1, wherein if the concatenation cost database contains the concatenation cost for the acoustic unit sequential pair, then synthesizing the speech using a respective concatenation cost for the speech from the concatenation cost database.
  - 3. The method of claim 1, further comprising synthesizing the speech using the actual concatenation cost calculated for the acoustic unit sequential pair.
  - 4. The method of claim 1, wherein the concatenation cost database is generated by assigning costs to the acoustic unit sequential pairs.
  - 5. The method of claim 1, wherein the concatenation cost database contains a portion of all possible concatenation costs.
  - 6. The method of claim 1, wherein the concatenation cost database is generated using statistical techniques which predict which of the acoustic unit sequential pairs are most likely to occur in common speech.
  - 7. The method of claim 1, wherein the actual concatenation cost comprises a weighted sum of subcosts across phones.
  - 8. The method of claim 1, wherein the actual concatenation cost provides an estimate of an acoustic mismatch between units in the acoustic unit sequential pair.
  - 10. The system of claim 8, further comprising if the concatenation cost database contains the concatenation cost for the acoustic unit sequential pair, synthesizing the speech using a respective concatenation cost for the speech from the concatenation cost database.
  - 11. The system of claim 8, further comprising synthesizing the speech using the actual concatenation cost calculated for the acoustic unit sequential pair.
  - 12. The system of claim 8, wherein the concatenation cost database contains a portion of all possible concatenation costs.
  - 13. The system of claim 8, wherein the concatenation cost database is generated using statistical techniques which predict which of the acoustic unit sequential pairs are most likely to occur in common speech.
  - 14. The system of claim 8, wherein the actual concatenation cost comprises a weighted sum of subcosts across phones.
  - 15. The system of claim 8, wherein the actual concatenation cost provides an estimate of an acoustic mismatch between units in the acoustic unit sequential pair.
  - 17. The non-transitory computer-readable storage media of claim 15, further comprising if the concatenation cost database contains the concatenation cost for the acoustic unit sequential pair, synthesizing the speech using a respective concatenation cost for the speech from the concatenation cost database.
  - 18. The non-transitory computer-readable storage media of claim 15, further comprising synthesizing the speech using the actual concatenation cost for the acoustic unit sequential pair.
  - 19. The non-transitory computer-readable storage media of claim 15, wherein the concatenation cost database is generated using statistical techniques which predict which of the acoustic unit sequential pairs are most likely to occur in common speech.
  - 20. The non-transitory computer-readable storage media of claim 15, wherein the actual concatenation cost provides an estimate of an acoustic mismatch between units in the acoustic unit sequential pair.

9. A system comprising:
- a processor; and
  
  a computer readable storage medium storing instructions for controlling the processor toperform steps comprising;
  
  generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs;
  
  determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and
  
  if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.

16. A non-transitory computer-readable storage media storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
- generating a concatenation cost database by synthesizing, via a processor, a body of speech and identifying acoustic unit sequential pairs in the body of speech and respective concatenation costs;
  
  determining whether an acoustic unit sequential pair to be used for synthesizing speech has a concatenation cost in the concatenation cost database; and
  
  if the concatenation cost database does not contain the concatenation cost for the acoustic unit sequential pair, calculating an actual concatenation cost for the acoustic unit sequential pair.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Beutnagel, Mark Charles, Mohri, Mehryar, Riley, Michael Dennis

Granted Patent

US 8,315,872 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/258
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/07   Concatenation rules

G10L 13/08   Text analysis or generation...

METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links