Method and system for preselection of suitable units for concatenative speech

US 8,224,645 B2
Filed: 12/01/2008
Issued: 07/17/2012
Est. Priority Date: 06/30/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method comprising:

receiving input text;

when candidate phonemes for synthesizing speech based on the input text are available from a top N triphone units, applying, using a processor, a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination;

when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and

synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes. If no candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, which single phonemes are used in synthesis independent of a triphone structure. The method also includes synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the selected single phonemes for synthesis from the single phoneme approach.

45 Citations

View as Search Results

15 Claims

1. A method comprising:
- receiving input text;
  
  when candidate phonemes for synthesizing speech based on the input text are available from a top N triphone units, applying, using a processor, a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination;
  
  when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and
  
  synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the plurality of triphone units in the database is generated by precalculating a list of all phonemes in a phoneme database that can be used in each of a plurality of triphone contexts.
  - 3. The method of claim 1, wherein applying the single phoneme approach to select phonemes for synthesis is performed using a complete set of phonemes of a given type.
  - 4. The method of claim 1, wherein a Viterbi search is applied as the cost process.
  - 5. The method of claim 1, wherein subsequent to the step of receiving input text, the method comprises parsing the received input text to recognizable units.
  - 6. The method of claim 5, wherein parsing the received text into recognizable units further comprises:
    - applying a text normalization process to parse the received text into known words and convert abbreviations into known words; and
      
      applying a syntactic process to perform a grammatical analysis of the known words and identify their associated parts of speech.

7. A system comprising:
- a processor;
  
  a non-transitory computer-readable storage medium storing instructions which, when executed on the processor, perform a method comprising;
  
  receiving input text;
  
  when candidate phonemes for synthesizing speech based on the input text are available from a top N triphone units, applying a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination;
  
  when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and
  
  synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure.
- View Dependent Claims (8, 9, 10)
- - 8. The system of claim 7, wherein a Viterbi search is applied as the cost process.
  - 9. The system of claim 7, further comprising instructions to control the processor to parse received text into recognizable units.
  - 10. The system of claim 9, wherein parsing the received text in a recognizable unit further comprises:
    - applying a text normalization process to parse the received text into known words and convert abbreviations into known words; and
      
      applying a syntactic process to perform a grammatical analysis of the known words and identify their associated parts of speech.

11. A non-transitory computer-readable medium storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
- receiving input text;
  
  when candidate phonemes are available in the top N triphone units applying a cost process to select a set of phonemes from the candidate phonemes, wherein the top N triphone units are determined, prior to receiving the input text, from a database comprising a plurality of triphone units, and wherein the top N triphone units comprise those triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination;
  
  when no candidate phonemes are available in the top N triphone units, applying a single phoneme approach to select single phonemes for synthesis; and
  
  synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the single phonemes, which, when used, are used independent of a triphone structure.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The tangible computer-readable medium of claim 11, wherein subsequent to the step of receiving the input text the following step is performed:
    - parsing the received text into recognizable units.
  - 13. The non-transitory computer-readable medium of claim 12, wherein the parsing comprises the steps of:
    - applying a text normalization process to parse the input text into known words;
      
      convert abbreviations into the known words; and
      
      applying a syntactic process to perform a grammatical analysis of the known words and identify their associated part of speech.
  - 14. The non-transitory computer-readable storage medium of claim 11, wherein the plurality of triphone units in the triphone unit database is generated by precalculating a list of all phonemes in a phoneme database that can be used in each of a plurality of triphone contexts.
  - 15. The non-transitory computer-readable storage medium of claim 11, wherein applying a single phoneme approach further comprises using a complete set of phonemes of a given type.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Conkie, Alistair D.
Primary Examiner(s)
Lerner, Martin

Application Number

US12/325,809
Publication Number

US 20090094035A1
Time in Patent Office

1,324 Days
Field of Search

704/258, 704/260, 704/266, 704/269
US Class Current

704/258
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 2015/022 Demisyllables, biphones or ...

Method and system for preselection of suitable units for concatenative speech

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for preselection of suitable units for concatenative speech

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links