METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH

US 20090094035A1
Filed: 12/01/2008
Published: 04/09/2009
Est. Priority Date: 06/30/2000
Status: Active Grant

First Claim

Patent Images

1. A method of speech synthesis, the method comprising:

receiving input text;

selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text;

if candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes;

if no candidate phonemes are available in the triphone unit selection database, applying a single phoneme approach to select phonemes for synthesis; and

synthesizing speech using the set of phonemes from the candidate phonemes and/or the selected phonemes for synthesis from the single phoneme approach.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u₁-u₂-u₃, 2) calculating a preselection cost for each 5-phoneme sequence u_a-u₁-u₂-u₃-u_b, where u₂is allowed to match any identically labeled phoneme in a database and the units u_aand u_bvary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

53 Citations

View as Search Results

17 Claims

1. A method of speech synthesis, the method comprising:
- receiving input text;
  
  selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text;
  
  if candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes;
  
  if no candidate phonemes are available in the triphone unit selection database, applying a single phoneme approach to select phonemes for synthesis; and
  
  synthesizing speech using the set of phonemes from the candidate phonemes and/or the selected phonemes for synthesis from the single phoneme approach.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the triphone unit selection database is generated by precalculating a list of all phonemes in a phoneme database that can be used in each of a plurality of triphone contexts.
  - 3. The method of claim 1, wherein applying the single phoneme approach to select phonemes for synthesis is performed using a complete set of phonemes of a given type.
  - 4. The method of claim 1, wherein a Viterbi search is applied as the cost process.
  - 5. The method of claim 1, wherein subsequent to the step of receiving input text, the method comprises parsing the received input text to recognizable units.
  - 6. The method of claim 5, wherein parsing the received text into recognizable units further comprises:
    - applying a text normalization process to parse the received text into known words and convert abbreviations into known words; and
      
      applying a syntactic process to perform a grammatical analysis of the known words and identify their associated parts of speech.

7. A computing device for synthesizing speech from text using a triphone unit selection database, the computing device comprising:
- a processor;
  
  a module configured to control the processor to receive input text;
  
  a module configured to control the processor to select a plurality of N phoneme units from the triphone unit selection database as candidate phonemes for synthesized speech based on the input text;
  
  a module configured to control the processor to apply a cost process to select a set of phonemes from the candidate phonemes; and
  
  a module configured to control the processor to synthesize speech using a selected set of phonemes.
- View Dependent Claims (8, 9, 10)
- - 8. The computing device of claim 7, wherein a Viterbi search is applied as the cost process.
  - 9. The computing device of claim 7, further comprising a module configured to control the processor to parse received text into recognizable units.
  - 10. The computing device of claim 9, wherein the module configured to control the processor to parse the received text in a recognizable unit further:
    - apply a text normalization process to parse the received text into known words and convert abbreviations into known words; and
      
      apply a syntactic process to perform a grammatical analysis of the known words and identify their associated parts of speech.

11. A tangible computer-readable medium storing instructions for controlling a computing device to synthesize speech from text using a triphone unit selection database, the instructions comprising:
- receiving input text;
  
  selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text;
  
  applying a cost process to select a set of phonemes from the candidate phonemes; and
  
  synthesizing speech using the set of phonemes.

13. The tangible computer-readable medium of claim 12, wherein subsequent to the step of receiving the input text the following step is performed:
- parsing the received text into recognizable units.
- View Dependent Claims (14)
- - 14. The tangible computer-readable medium of claim 13, wherein the parsing comprises the steps of:
    - applying a text normalization process to parse the received text into known words;
      
      convert abbreviations into known words; and
      
      applying a syntactic process to perform a grammatical analysis of the know words and identify their associated part of speech.

15. A computing device for synthesizing speech, the computing device comprising:
- a processor;
  
  a module configured to control the processor to receive input text;
  
  a module configured to control the processor to select a plurality of N phonemes from a triphone unit selection database as candidate phonemes for a synthesized speech based on the input text;
  
  a module configured to control the processor, if candidate phonemes are available in the triphone unit selection database, to apply a cost process to select a set of phonemes from the candidate phonemes;
  
  a module configured to control the processor, if no candidate phonemes are available in the triphone unit selection database, to apply a single phoneme approach to select phonemes for synthesis; and
  
  a module configured to control the processor to synthesize speech using the set of phonemes from the candidate phonemes and/or the selected phonemes for synthesis from the single phoneme approach.
- View Dependent Claims (16, 17)
- - 16. The computing device of claim 15, wherein the triphone unit selection database is generated by precalculating a list of all phonemes in a phoneme database that can be used in each of a plurality of triphone contexts.
  - 17. The computing device of claim 15, wherein the module configured to apply a single phoneme approach further applies the single phoneme approach by using a complete set of phonemes of a given type.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Conkie, Alistair D.

Granted Patent

US 8,224,645 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 2015/022 Demisyllables, biphones or ...

METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

53 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

53 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links