Method and system for performing concatenative speech synthesis using half-phonemes

US 6,173,263 B1
Filed: 08/31/1998
Issued: 01/09/2001
Est. Priority Date: 08/31/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method of synthesizing speech using half-phonemes, comprising:

receiving input text;

converting the input text into a sequence of half-phonemes;

comparing the half-phonemes in the sequence with a plurality of half-phonemes stored in a database;

selecting one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements;

processing the selected half-phonemes into synthesized speech; and

outputting the synthesized speech to an output device.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system are provided for performing concatenative speech synthesis using half-phonemes to allow the full utilization of both diphone synthesis and unit selection techniques in order to provide synthesis quality that can combine intelligibility achieved using diphone synthesis with a naturalness achieved using unit selection. The concatenative speech synthesis system may include a speech synthesizer that may comprise a linguistic processor, a unit selector and a speech processor. A speech training module may input trained speech off-line to the unit selector. The concatenative speech synthesis may normalize the input text in order to distinguish sentence boundaries from abbreviations. The normalized text is then grammatically analyzed to identify the syntactic structure of each constituent phrase. Orthographic characters used in normal text are mapped into appropriate strings of phonetic segments representing units of sound and speech. Prosody is then determined and timing and intonation patterns are then assigned to each of the half-phonemes. Once the text is converted into half-phonemes, the unit selector compares a requested half-phoneme sequence with units stored in the database in order to generate a candidate list for each half-phoneme. The candidate list is then input into a Viterbi searcher which determines the best match of all half-phonemes in the phoneme sequence. The selected string is then output to a speech processor for processing output audio to a speaker.

Citations

20 Claims

1. A method of synthesizing speech using half-phonemes, comprising:
- receiving input text;
  
  converting the input text into a sequence of half-phonemes;
  
  comparing the half-phonemes in the sequence with a plurality of half-phonemes stored in a database;
  
  selecting one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements;
  
  processing the selected half-phonemes into synthesized speech; and
  
  outputting the synthesized speech to an output device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the converting step comprises the steps of:
3. The method of claim 1, wherein the comparing step produces a pre-selected candidate list of half-phonemes from the database.
4. The method of claim 3, wherein the comparing step pre-selects candidate half-phonemes based on a predetermined threshold.
5. The method of claim 3, wherein the selecting step selects half-phonemes from the candidate half-phonemes using a Viterbi search mechanism.
6. The method of claim 1, wherein the selecting step selects half-phonemes based on the statistical measurements computed for individual half-phonemes and spectral measurements computed based on the relationship between half-phonemes in the sequence of half-phonemes.
7. The method of claim 1, further comprising:
- computing statistical measurements between half-phonemes of training speech contained in a database; and
  
  outputting the statistical measurements for performing the selecting step.
8. The method of claim 6, further comprising:
- indexing the half-phonemes in the database based on timing measurements.
9. The method of claim 1, wherein the processing step synthesizes speech using one of Linear Predictive Coding, Time-Domain Pitch-Synchronous Overlap Add, or Harmonic Plus Noise methods.

10. A system for synthesizing speech using half-phonemes, comprising:
- a linguistic processor that receives input text and converts the input text into a sequence of half-phonemes;
  
  a unit selector, coupled to the linguistic processor, that compares the half-phonemes in the sequence with a plurality of half-phonemes stored in a database and selects one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements; and
  
  a speech processor, coupled to the unit selector, that processes the selected half-phonemes into synthesized speech and outputs the synthesized speech to an output device.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein the linguistic processor further comprises:
12. The system of claim 10, wherein the unit selector further comprises:
- a preselector that selects a candidate list of half-phonemes from the database.
13. The system of claim 12, wherein the preselector selects candidate half-phonemes based on a predetermined threshold.
14. The system of claim 13, wherein the unit selector further comprises:
- a Viterbi searcher, coupled to the preselector, that selects half-phonemes from the candidate half-phonemes using Viterbi search mechanisms.
15. The system of claim 14, wherein the Viterbi searcher selects half-phonemes based on the statistical measurements computed for individual half-phonemes and spectral measurements computed based on the relationship between half-phonemes in the sequence of half-phonemes.
16. The system of claim 10, further comprising:
- a speech training module, coupled to the unit selector, that computes statistical measurements between half-phonemes of training speech contained in a database, and outputs the statistical measurements to the unit selector.
17. The system of claim 16, wherein the speech training module indexes the half-phonemes in the database based on timing measurements.
18. The system of claim 10, wherein the speech processor synthesizes speech using one of Linear Predictive Coding, Time-Domain Pitch-Synchronous Overlap Add, or Harmonic Plus Noise methods.

19. A system for synthesizing speech using half-phonemes, comprising:
- linguistic processing means for receiving input text and converting the input text into a sequence of half-phonemes;
  
  unit selecting means for comparing the half-phonemes in the sequence with a plurality of half-phonemes stored in a database and selecting one of the plurality of half-phonemes from the database for each of the half-phonemes in the sequence based on statistical measurements; and
  
  speech processing means for processing the selected half-phonemes into synthesized speech and outputting the synthesized speech to an output device.
- View Dependent Claims (20)
- - 20. The system of claim 19, further comprising:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Conkie, Alistair
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/144,020
Time in Patent Office

862 Days
Field of Search

704/260, 704/268, 704/258, 704/255, 704/254, 704/262
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/07 Concatenation rules

Method and system for performing concatenative speech synthesis using half-phonemes

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for performing concatenative speech synthesis using half-phonemes

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links