Method and system for preselection of suitable units for concatenative speech

US 6,684,187 B1
Filed: 06/30/2000
Issued: 01/27/2004
Est. Priority Date: 06/30/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method of synthesizing speech from an input text using phonemes, the method comprising the steps:

a) creating a triphone preselection cost database including a plurality of all likely triphone combinations and generating a key to index each triphone in the database, wherein creating the triphone preselection cost database further comprises;

1) selecting a predetermined triphone sequence u₁-u₂-u₃; and

2) calculating a preselection cost for each 5-phoneme sequence u_a-u₁-u₂-u₃-u_b, where u₂is allowed to match any identically labeled phoneme in the database and the units u_aand u_bvary over the entire phoneme universe;

b) retrieving a portion of the input text for synthesis as a phoneme sequence;

c) comparing a retrieved phoneme, in context with its neighboring phonemes, with a plurality of N least cost triphone keys stored in the triphone preselection cost database;

d) choosing, as candidates for synthesis, a list of units from the triphone preselection cost database that comprise a matching triphone key;

e) repeating steps b) through d) for each phoneme in the input text;

f) selecting the least cost path through the network of candidates;

g) processing the phonemes selected in step f) into synthesized speech; and

h) outputting the synthesized speech to an output device.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. Prior to initiating the “real time” synthesis, a database is created of ail possible triphones (there are approximately 10000 in the English language) and their associated preselection costs. At run time, therefore, only the most likely candidates are selected from the triphone database, significantly reducing the calculations that are required to be performed in real time.

Citations

7 Claims

1. A method of synthesizing speech from an input text using phonemes, the method comprising the steps:
- a) creating a triphone preselection cost database including a plurality of all likely triphone combinations and generating a key to index each triphone in the database, wherein creating the triphone preselection cost database further comprises;
  
  1) selecting a predetermined triphone sequence u₁-u₂-u₃; and
  
  2) calculating a preselection cost for each 5-phoneme sequence u_a-u₁-u₂-u₃-u_b, where u₂is allowed to match any identically labeled phoneme in the database and the units u_aand u_bvary over the entire phoneme universe;
  
  b) retrieving a portion of the input text for synthesis as a phoneme sequence;
  
  c) comparing a retrieved phoneme, in context with its neighboring phonemes, with a plurality of N least cost triphone keys stored in the triphone preselection cost database;
  
  d) choosing, as candidates for synthesis, a list of units from the triphone preselection cost database that comprise a matching triphone key;
  
  e) repeating steps b) through d) for each phoneme in the input text;
  
  f) selecting the least cost path through the network of candidates;
  
  g) processing the phonemes selected in step f) into synthesized speech; and
  
  h) outputting the synthesized speech to an output device.
- View Dependent Claims (2, 3, 4)
- - 2. The method as defined in claim 1 wherein in performing step a2), the preselection cost is the target cost or an element of the target cost.
  - 3. The method as defined in claim 1, wherein creating a triphone preselection cost database further comprises:
4. The method as defined in claim 3, wherein in performing step a4), N=50.

5. A method of creating a preselection cost database of triphones to be used in speech synthesis, the method comprising the steps of:
- a) selecting a predetermined triphone sequence u₁-u₂-u₃;
  
  b) calculating a preselection cost for each 5-phoneme sequence u_a-u₁-u₂-u₃-u_b, where u₂is allowed to match any identically labeled phoneme in the database and the units u_aand u_bvary over the entire phoneme universe;
  
  c) determining a plurality of N least cost database units for the particular 5-phoneme context;
  
  d) performing the union of the plurality of N least cost database units determined in step c);
  
  e) storing the union created in step d) in a triphone preselection cost database; and
  
  f) repeating steps a)-e) for each possible triphone sequence.
- View Dependent Claims (6, 7)
- - 6. The method as defined in claim 5 wherein in performing step d), a plurality of fifty least cost sequences and associated costs are stored.
  - 7. The method as defined in claim 5 wherein in performing step b), the preselection cost is the target cost or an element of the target cost.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Conkie, Alistair D.
Primary Examiner(s)
AZAD, ABUL K

Application Number

US09/607,615
Time in Patent Office

1,306 Days
Field of Search

704/258, 704/260, 704/251-257, 704/239-242, 704/268, 704/266
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 2015/022 Demisyllables, biphones or ...

Method and system for preselection of suitable units for concatenative speech

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for preselection of suitable units for concatenative speech

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links