Methods and apparatus for rapid acoustic unit selection from a large speech corpus

US 7,369,994 B1
Filed: 05/04/2006
Issued: 05/06/2008
Est. Priority Date: 04/30/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method of synthesizing speech, the method comprising:

selecting a pair of acoustic units from an acoustic unit database;

identifying a concatenation cost between the pair of acoustic units based on communication with a concatenation cost database; and

synthesizing speech using the concatenation cost for the selected pair of acoustic units.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. Accordingly, a method is disclosed for constructing an efficient concatenation cost database by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatention costs, and storing those concatenation costs likely to occur.

33 Citations

View as Search Results

25 Claims

1. A computer-implemented method of synthesizing speech, the method comprising:
- selecting a pair of acoustic units from an acoustic unit database;
  
  identifying a concatenation cost between the pair of acoustic units based on communication with a concatenation cost database; and
  
  synthesizing speech using the concatenation cost for the selected pair of acoustic units.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the concatenation cost is a measure of the mismatch between the pair of acoustic units.
  - 3. The method of claim 1, wherein the concatenation cost database contains a subset of all possible acoustic unit sequential pairs.
  - 4. The method of claim 1, wherein the concatenation with the concatenation cost database comprises:
    - extracting a concatenation cost of the pair of acoustic units form the concatenation cost database if the concatenation cost database contains the concatenation cost of the pair of acoustic units; and
      
      determining a value of the concatenation cost of the pair of acoustic units if the concatenation cost database does not contain the concatenation cost of the pair of acoustic units.
  - 5. The method of claim 1, wherein the concatenation cost database is derived at least in part using statistical techniques which predict acoustic unit sequential pairs likely to occur in speech.
  - 6. The method of claim 1, wherein the concatenation cost database is derived at least in part by assigning costs to acoustic unit sequential pairs.
  - 7. The method of claim 1, wherein selecting at least one acoustic unit from the acoustic unit database further uses at least one target cost of an acoustic unit, the target cost being a measure of the mismatch between an acoustic unit and a phoneme.
  - 8. The method of claim 4, wherein determining a value of the concatenation cost of the pair of acoustic units comprises computing the concatenation cost of the pair of acoustic units.

9. A concatenation cost database stored in a computer-readable medium, the concatenation cost database generated according to a method comprising:
- identifying at least some acoustic units to prune an acoustic unit database; and
  
  storing in a concatenation cost database, concatenation costs for sequential acoustic units associated with the pruned acoustic unit database.

10. A computer-readable medium storing instructions for controlling a computing device, the instructions comprising:
- selecting a pair of acoustic units from an acoustic unit database;
  
  identifying a concatenation cost between the pair of acoustic units based on communication with a concatenation cost database; and
  
  synthesizing speech using the concatenation cost for the selected pair of acoustic units.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The computer-readable medium of claim 10, wherein the concatenation cost is a measure of the mismatch between the pair of acoustic units.
  - 12. The computer-readable medium of claim 10, wherein the concatenation cost database contains a subset of all possible acoustic unit sequential pairs.
  - 13. The computer-readable medium of claim 10, wherein the communication with the concatenation cost database comprises:
    - extracting a concatenation cost of the pair of acoustic units from the concatenation cost database if the concatenation cost database contains the concatenation cost of the pair of acoustic units; and
      
      determining a value of the concatenation cost of the pair of acoustic units if the concatenation cost database does not contain the concatenation cost of the pair of acoustic units.
  - 14. The computer-readable medium of claim 13, wherein determining a value of the concatenation cost of the pair of acoustic units comprises computing the concatenation cost of the pair of acoustic units.
  - 15. The computer-readable medium of claim 10, wherein the concatenation cost database is derived at least in part using statistical techniques which predict acoustic unit sequential pairs likely to occur in speech.
  - 16. The computer-readable medium of claim 10, wherein the concatenation cost database is derived at least in part by assigning costs to acoustic unit sequential pairs.
  - 17. The computer-readable medium of claim 10, wherein selecting at least one acoustic unit from the acoustic unit database further uses at least one target cost of an acoustic unit, the target cost being a measure of the mismatch between an acoustic unit and a phoneme.

18. A system for synthesizing speech, the system comprising:
- a module configured to select a pair of acoustic units from an acoustic unit database;
  
  a module configured to identify a concatenation cost between the pair of acoustic units based on communication with a concatenation cost database; and
  
  a module configured to synthesize speech using the concatenation cost for the selected pair of acoustic units.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
- - 19. The system of claim 18, wherein the concatenation cost is a measure of the mismatch between the pair of acoustic units.
  - 20. The system of claim 18, wherein the concatenation cost database contains a subset of all possible acoustic unit sequential pairs.
  - 21. The system of claim 18, wherein the communication with the concatenation cost database comprises:
    - extracting a concatenation cost of the pair of acoustic units from the concatenation cost database if the concatenation cost database contains the concatenation cost of the pair of acoustic units; and
      
      determining a value of the concatenation cost of the pair of acoustic units if the concatenation cost database does not contain the concatenation cost of the pair of acoustic units.
  - 22. The system of claim 18, wherein the concatenation cost database is derived at least in part using statistical techniques which predict acoustic unit sequential pairs likely to occur in speech.
  - 23. The system of claim 18, wherein the concatenation cost database is derived at least in part by assigning costs to acoustic unit sequential pairs.
  - 24. The system of claim 18, wherein the module configured to select at least one acoustic unit from the acoustic unit database further uses at least one target cost of an acoustic unit, the target cost being a measure of the mismatch between an acoustic unit and a phoneme.
  - 25. The system of claim 21, wherein the module configured to determine a value of the concatenation cost of the pair of acoustic units comprises computing the concatenation cost of the pair of acoustic units.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Beutnagel, Mark C., Mohri, Mehryar, Riley, Michael D.
Primary Examiner(s)
Lerner, Martin

Application Number

US11/381,544
Time in Patent Office

733 Days
Field of Search

704/258, 704/259, 704/260, 704/263, 704/266
US Class Current

704/258
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/07   Concatenation rules

G10L 13/08   Text analysis or generation...

Methods and apparatus for rapid acoustic unit selection from a large speech corpus

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

33 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for rapid acoustic unit selection from a large speech corpus

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links