Speech synthesis system

US 5,682,501 A
Filed: 02/21/1995
Issued: 10/28/1997
Est. Priority Date: 06/22/1994
Status: Expired due to Fees

First Claim

Patent Images

1. A method for generating synthesized speech from input text, the method comprising the steps of:

decomposing the input text into a sequence of speech units;

estimating a duration value for each speech unit in the sequence of speech units;

synthesizing speech based on said sequence of speech units and duration values;

characterized in that said estimating step utilizes a Hidden Markov Model (HMM) to determine the most likely sequence of duration values given said sequence of speech units, wherein each state of the HMM represents a duration value and each output from the HMM is a speech unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesis unit comprises a text processor which breaks down text into phonemes, a prosodic processor which assigns properties such as length and pitch to the phonemes based on context, and a synthesis unit which outputs an audio signal representing the sequence of phonemes according to the specified properties. The prosodic processor includes a Hidden Markov Model (HMM) to predict the durations of the phonemes. Each state of the HMM represents a duration, and the outputs are phonemes. The HMM is trained on a set of data consisting of phonemes of known identity and duration, to allow the state transition and output distributions to be calculated. The HMM can then be used for any given input sequence of phonemes to predict a most likely sequence of corresponding durations.

68 Citations

View as Search Results

10 Claims

1. A method for generating synthesized speech from input text, the method comprising the steps of:
- decomposing the input text into a sequence of speech units;
  
  estimating a duration value for each speech unit in the sequence of speech units;
  
  synthesizing speech based on said sequence of speech units and duration values;
  
  characterized in that said estimating step utilizes a Hidden Markov Model (HMM) to determine the most likely sequence of duration values given said sequence of speech units, wherein each state of the HMM represents a duration value and each output from the HMM is a speech unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein a state transition probability distribution of the HMM is dependent on one or more of the immediately preceding states.
  - 3. The method according to claim 2, wherein the state transition probability distribution of the HMM is dependent on the identity of the two immediately preceding states.
  - 4. The method according to claim 1, wherein an output probability distribution of the HMM is dependent on the current state of the HMM.
  - 5. The method according to claim 1, further comprising the steps of:
    - obtaining a set of speech data which has been decomposed into a sequence of speech units, each of which has been assigned a duration value;
      
      estimating a state transition probability distribution and an output probability distribution of the HMM from said set of speech data.
  - 6. The method according to claim 5, wherein the step of estimating the state transition and output probability distributions of the HMM includes the step of smoothing the set of speech data to reduce any statistical fluctuations therein.
  - 7. The method according to claim 6, wherein the set of speech data is obtained by means of a speech recognition system.
  - 8. The method according to claim 7, wherein the determination of the most likely sequence of duration values is performed using the Viterbi algorithm.
  - 9. The method according to claim 8, wherein each of said speech units is a phoneme.

10. A speech synthesis system for generating synthesized speech from input text comprising:
- a text processor for decomposing the input text into a sequence of speech units;
  
  a prosodic processor for estimating a duration value for each speech unit in the sequence of speech units;
  
  a synthesis unit for synthesizing speech based on said sequence of speech units and duration values;
  
  and characterized in that said prosodic processor utilizes a Hidden Markov Model (HMM) to determine the most likely sequence of duration values given said sequence of speech units, wherein each state of the HMM represents a duration value and each output from the HMM is a speech unit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Sharman, Richard Anthony
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
OPSASNICK, MICHAEL N

Application Number

US08/391,731
Time in Patent Office

980 Days
Field of Search

381/41, 381/43, 395/2.65, 395/2.66, 395/2.75, 395/2.67, 395/2.7
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/08 Text analysis or generation...

Speech synthesis system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

68 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

68 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links