Synthesis-based pre-selection of suitable units for concatenative speech

US 6,505,158 B1
Filed: 07/05/2000
Issued: 01/07/2003
Est. Priority Date: 07/05/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method of synthesizing speech from text input using unit selection, the method comprising the steps of:

a) creating a triphone preselection database from an input stream of speech synthesis by collecting units observed to occur in particular triphone contexts, a triphone comprising a sequence of three phoneme units;

b) receiving a stream of input text to be synthesized;

c) converting the received input text into a sequence of phonemes by parsing the input text into identifiable syntactic phrases;

d) comparing the sequence of phonemes formed in step c), also considering neighboring phonemes so as to form input triphones, to a plurality of commonly occurring triphones stored in the triphone preselection database to select a plurality of N phoneme units as candidates for synthesis;

e) selecting a set of candidates of step d) by applying a cost process to each path through the plurality of N phoneme units associated with each phoneme sequence and choosing a least cost set of phoneme units;

f) processing the least cost phoneme units selected in step e) into synthesized speech; and

g) outputting the synthesized speech to an output device.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for providing concatenative speech uses a speech synthesis input to populate a triphone-indexed database that is later used for searching and retrieval to create a phoneme string acceptable for a text-to-speech operation. Prior to initiating the “real time” synthesis, a database is created of all possible triphone contexts by inputting a continuous stream of speech. The speech data is then analyzed to identify all possible triphone sequences in the stream, and the various units chosen for each context. During a later text-to-speech operation, the triphone contexts in the text are identified and the triphone-indexed phonemes in the database are searched to retrieve the best-matched candidates.

333 Citations

10 Claims

1. A method of synthesizing speech from text input using unit selection, the method comprising the steps of:
- a) creating a triphone preselection database from an input stream of speech synthesis by collecting units observed to occur in particular triphone contexts, a triphone comprising a sequence of three phoneme units;
  
  b) receiving a stream of input text to be synthesized;
  
  c) converting the received input text into a sequence of phonemes by parsing the input text into identifiable syntactic phrases;
  
  d) comparing the sequence of phonemes formed in step c), also considering neighboring phonemes so as to form input triphones, to a plurality of commonly occurring triphones stored in the triphone preselection database to select a plurality of N phoneme units as candidates for synthesis;
  
  e) selecting a set of candidates of step d) by applying a cost process to each path through the plurality of N phoneme units associated with each phoneme sequence and choosing a least cost set of phoneme units;
  
  f) processing the least cost phoneme units selected in step e) into synthesized speech; and
  
  g) outputting the synthesized speech to an output device.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method as defined in claim 1 wherein in performing step a) the following steps are performed:
3. The method as defined in claim 2 wherein in performing step a1), the continuous input stream continues for a time period of approximately two weeks.
4. The method as defined in claim 1 wherein in performing step c), the converting process uses half-phonemes to create phoneme sequences, with unit spacing between adjacent half-phonemes.
5. The method as defined in claim 1 wherein in performing step e), a Viterbi search mechanism is used.

6. A method of creating a triphone preselection database for use in generating synthesized speech from a stream of input text, the method comprising the steps of:
- a) providing a continuous input stream of synthesized speech for a predetermined time period t;
  
  b) parsing the speech input stream into phoneme units;
  
  c) finding the unique database unit number associated with each phoneme;
  
  d) identifying all possible triphone combinations from the parsed phonemes; and
  
  e) tabulating unit numbers for the identified phonemes so as to index the database by the identified triphones.
- View Dependent Claims (7)
- - 7. The method as defined in claim 6 wherein in performing step a), the continuous input stream continues for a time period of approximately two weeks.

8. A system for synthesizing speech using phonemes, comprisinga linguistic processor for receiving input text and converting said text into a sequence of phonemes;
- a database of indexed phonemes, the index based on precalculated costs of phonemes in various triphone sequences;
  
  a unit selector, coupled to both the linguistic process and the triphone database, for comparing each received phoneme, including its triphone context, to the indexed phonemes in said database and selecting a set of candidate phonemes for synthesis; and
  
  a speech processor, coupled to the unit selector, for processing selected candidate phonemes into synthesized speech and providing as an output the synthesized speech to an output device.
- View Dependent Claims (9, 10)
- - 9. A system as defined in claim 8 wherein the database comprises an indexed set of phonemes, based on triphone context, created from a stream of speech continuing from a predetermined period of time t.
  - 10. A system as defined in claim 9 wherein the predetermined period of time t is approximately two weeks.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Conkie, Alistair D.
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/609,889
Time in Patent Office

916 Days
Field of Search

704/220, 704/258, 704/260, 704/254, 704/268, 704/255, 704/262
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

Synthesis-based pre-selection of suitable units for concatenative speech

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

333 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Synthesis-based pre-selection of suitable units for concatenative speech

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

333 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links