Phrase splicing and variable substitution using a trainable speech synthesizer

US 6,266,637 B1
Filed: 09/11/1998
Issued: 07/24/2001
Est. Priority Date: 09/11/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method for providing generation of speech comprising the steps of:

providing splice phrases including recorded human speech to be employed in synthesizing speech;

constructing a splice file dictionary including every word and every word sequence for the splice phrases and including a phone sequence associated with every word and every word sequence for the splice phrases;

providing input to be acoustically produced;

comparing the input to training data in the splice file dictionary to identify one of words and word sequences corresponding to the input for constructing a phone sequence;

comparing the input to a pronunciation dictionary when the input is not found in the training data of the splice file dictionary;

identifying a segment sequence using a first search algorithm to construct output speech according to the phone sequence; and

concatenating segments of the segment sequence and modifying characteristics of the segments to be substantially equal to requested characteristics.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with the present invention, a method for providing generation of speech includes the steps of providing input to be acoustically produced, comparing the input to training data or application specific splice files to identify one of words and word sequences corresponding to the input for constructing a phone sequence, using a search algorithm to identify a segment sequence to construct output speech according to the phone sequence and concatenating segments and modifying characteristics of the segments to be substantially equal to requested characteristics. Application specific data is advantageously used to make pertinent information available to synthesize both the phone sequence and the output speech. Also, described is a system for performing operations in accordance with the disclosure.

Citations

27 Claims

1. A method for providing generation of speech comprising the steps of:
- providing splice phrases including recorded human speech to be employed in synthesizing speech;
  
  constructing a splice file dictionary including every word and every word sequence for the splice phrases and including a phone sequence associated with every word and every word sequence for the splice phrases;
  
  providing input to be acoustically produced;
  
  comparing the input to training data in the splice file dictionary to identify one of words and word sequences corresponding to the input for constructing a phone sequence;
  
  comparing the input to a pronunciation dictionary when the input is not found in the training data of the splice file dictionary;
  
  identifying a segment sequence using a first search algorithm to construct output speech according to the phone sequence; and
  
  concatenating segments of the segment sequence and modifying characteristics of the segments to be substantially equal to requested characteristics.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as recited in claim 1, wherein the characteristics include at least one of duration, energy and pitch.
  - 3. The method as recited in claim 1, wherein the step of comparing the input to training data includes the step of searching the training data using a second search algorithm.
  - 4. The method as recited in claim 3, wherein the second search algorithm includes a greedy algorithm.
  - 5. The method as recited in claim 1, wherein the first search algorithm includes a dynamic programming algorithm.
  - 6. The method as recited in claim 1, further comprising the step of outputting synthetic speech.
  - 7. The method as recited in claim 1, further comprising the step of using the first search algorithm, performing a search over the segments in decision tree leaves.

8. A method for providing generation of speech comprising the steps of:
- providing splice phrases including recorded human speech to be employed in synthesizing speech;
  
  constructing a splice file dictionary including every word and every word sequence for the splice phrases and including a phone sequence associated with every word and every word sequence for the splice phrases;
  
  providing input to be acoustically produced;
  
  comparing the input to application specific splice files in the splice file dictionary to identify one of words and word sequences corresponding to the input for constructing a phone sequence;
  
  augmenting a generic segment inventory by adding segments corresponding to the identified words and word sequences;
  
  identifying a segment sequence, using a first search algorithm and the augmented generic segment inventory to construct output speech according to the phone sequence; and
  
  concatenating the segments of the segment sequence and modifying characteristics of the segments of the segment sequence to be substantially equal to requested characteristics.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 9. The method as recited in claim 8, wherein the characteristics include at least one of duration, energy and pitch.
  - 10. The method as recited in claim 8, wherein the step of comparing includes the step of searching the application specific splice files using a second search algorithm and the splice file dictionary.
  - 11. The method as recited in claim 10, wherein the second search algorithm includes a greedy algorithm.
  - 12. The method as recited in claim 8, wherein the step of comparing includes the step of comparing the input to a pronunciation dictionary when the input is not found in the splice files in the splice file dictionary.
  - 13. The method as recited in claim 8, wherein the first search algorithm includes a dynamic programming algorithm.
  - 14. The method as recited in claim 8, further comprising the step of using the first search algorithm, performing a search over the segments in decision tree leaves.
  - 15. The method as recited in claim 8, further comprising the step of outputting synthetic speech.
  - 16. The method as recited in claim 8, wherein the step of identifying includes the step of bypassing costing of the characteristics of the segments from a splicing inventory against the requested characteristics.
  - 17. The method as recited in claim 8, wherein the step of identifying includes the step of applying pitch discontinuity costing across the segment sequence.
  - 18. The method as recited in claim 8, further comprising the step of selecting segments from a splicing inventory to provide the requested characteristics.
  - 19. The method as recited in claim 8, wherein the requested characteristics include pitch and further comprising the step of selecting segments from the generic segment inventory to provide the requested pitch characteristics.
  - 20. The method as recited in claim 19, further comprising the step of applying pitch discontinuity smoothing to the requested pitch characteristics provided by the selected segments from the generic segment inventory.

21. A system for generating synthetic speech comprising:
- a splice file dictionary including splice phrases of recorded human speech to be employed in synthesizing speech the splice file dictionary including every word and every word sequence for the splice phrases and including a phone sequence associated with every word and every word sequence for the splice phrases;
  
  means for providing input to be acoustically produced;
  
  means for comparing the input to application specific splice files in the splice file dictionary to identify one of words and word sequences corresponding to the input for constructing a phone sequence;
  
  means for augmenting a generic segment inventory by adding segments corresponding to sentences including the identified words and word sequences;
  
  a synthesizer for utilizing a first search algorithm and the augmented generic inventory to identify a segment sequence to construct output speech according to the phone sequence; and
  
  means for concatenating segments of the segment sequence and modifying characteristics of the segments of the segment sequence to be substantially equal to requested characteristics.
- View Dependent Claims (22, 23, 24, 25, 26, 27)
- - 22. The system as recited in claim 21, wherein the generic segment inventory includes pre-recorded speaker data to train a set of decision-tree state-clustered hidden Markov models.
  - 23. The system as recited in claim 21, wherein the first search algorithm includes a dynamic programming algorithm.
  - 24. The system as recited in claim 21, wherein the means for comparing includes a second search algorithm.
  - 25. The system as recited in claim 24, wherein the second search algorithm includes a greedy algorithm.
  - 26. The system as recited in claim 21, wherein the means for comparing compares the input to a pronunciation dictionary when the input is not found in the splice files.
  - 27. The system as recited in claim 21, wherein the first search algorithm performs a search over the segments in decision tree leaves.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Franz, Martin, Roukos, Salim E., Sorensen, Jeffrey, Donovan, Robert E.
Primary Examiner(s)
Zele, Krista
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/152,178
Time in Patent Office

1,047 Days
Field of Search

704/258, 704/260, 704/265
US Class Current

704/258
CPC Class Codes

G10L 13/06 Elementary speech units use...

Phrase splicing and variable substitution using a trainable speech synthesizer

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Phrase splicing and variable substitution using a trainable speech synthesizer

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links