Speech synthesis for synthesizing missing parts

US 8,214,216 B2
Filed: 06/03/2004
Issued: 07/03/2012
Est. Priority Date: 06/05/2003
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis device, comprising:

voice unit storage means for storing a plurality of pieces of voice unit data representing voice units;

phoneme storage means for storing a plurality of pieces of phoneme data each of which is a phoneme or comprises phoneme fragments composing a phoneme;

cadence prediction means for inputting sentence information representing a sentence to predict the cadence of voice units composing the sentence;

selecting means using a processor for selecting voice unit data satisfying predetermined conditions out of the plurality of pieces of voice unit data stored in the voice unit storage means, wherein the predetermined conditions are that the voice unit data to be selected matches in its reading with the voice unit composing the sentence and has a correlation greater than a predetermined amount with a cadence prediction result by the cadence prediction means;

missing part cadence prediction means using a processor for predicting the cadence of voice units which have been decided not to satisfy the predetermined conditions by the selection means;

missing part synthesis means using a processor for specifying phonemes contained in the voice unit decided not to satisfy the predetermined condition by the selection means out of the voice units composing the sentence, for acquiring phoneme data representing the specified phoneme or phoneme fragments composing the specified phoneme from the phoneme storage means, for converting the acquired phoneme data so that the phoneme or phoneme fragments represented by the acquired phoneme data matches with a cadence prediction result by the missing part cadence prediction means, and for interconnecting the converted data, thereby synthesizing speech data representing a waveform of the voice unit; and

creation means for interconnecting the voice unit data selected by the selection means and the speech data synthesized by the missing part synthesis means, thereby creating data representing synthesis speech.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A simply configured speech synthesis device and the like for producing a natural synthetic speech at high speed. When data representing a message template is supplied, a voice unit editor (5) searches a voice unit database (7) for voice unit data on a voice unit whose sound matches a voice unit in the message template. Further, the voice unit editor (5) predicts the cadence of the message template and selects, one at a time, a best match of each voice unit in the message template from the voice unit data that has been retrieved, according to the cadence prediction result. For a voice unit for which no match can be selected, an acoustic processor (41) is instructed to supply waveform data representing the waveform of each unit voice. The voice unit data that is selected and the waveform data that is supplied by the acoustic processor (41) are combined to generate data representing a synthetic speech.

45 Citations

View as Search Results

12 Claims

1. A speech synthesis device, comprising:
- voice unit storage means for storing a plurality of pieces of voice unit data representing voice units;
  
  phoneme storage means for storing a plurality of pieces of phoneme data each of which is a phoneme or comprises phoneme fragments composing a phoneme;
  
  cadence prediction means for inputting sentence information representing a sentence to predict the cadence of voice units composing the sentence;
  
  selecting means using a processor for selecting voice unit data satisfying predetermined conditions out of the plurality of pieces of voice unit data stored in the voice unit storage means, wherein the predetermined conditions are that the voice unit data to be selected matches in its reading with the voice unit composing the sentence and has a correlation greater than a predetermined amount with a cadence prediction result by the cadence prediction means;
  
  missing part cadence prediction means using a processor for predicting the cadence of voice units which have been decided not to satisfy the predetermined conditions by the selection means;
  
  missing part synthesis means using a processor for specifying phonemes contained in the voice unit decided not to satisfy the predetermined condition by the selection means out of the voice units composing the sentence, for acquiring phoneme data representing the specified phoneme or phoneme fragments composing the specified phoneme from the phoneme storage means, for converting the acquired phoneme data so that the phoneme or phoneme fragments represented by the acquired phoneme data matches with a cadence prediction result by the missing part cadence prediction means, and for interconnecting the converted data, thereby synthesizing speech data representing a waveform of the voice unit; and
  
  creation means for interconnecting the voice unit data selected by the selection means and the speech data synthesized by the missing part synthesis means, thereby creating data representing synthesis speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The speech synthesis device according to claim 1, wherein the selection means selects the voice unit data out of the plurality of pieces of voice unit data stored in the voice unit storage means under the predetermined conditions further including that the presence or absence of nasalization or devocalization of the voice unit data matches with the cadence prediction result by the cadence prediction means.
  - 3. The speech synthesis device according to claim 2, wherein the voice unit storage means operates to associate phonetic data representing a reading of voice unit with the voice unit data, and the selection means operates to handle voice unit data which is associated with phonetic data representing a reading matching with the reading of the voice unit composing the sentence, as voice unit whose reading is common with the voice unit.
  - 4. The speech synthesis device according to claim 2, wherein the device further comprises utterance speed conversion means for acquiring utterance speed data specifying conditions of a speed for producing the synthesis speech created by the creation means and for converting the voice unit data and/or speech data so as to represent a speech to be produced at a speed satisfying the conditions specified by the utterance speed data.
  - 5. The speech synthesis device according to claim 4, wherein the voice unit storage means operates to associate phonetic data representing a reading of voice unit with the voice unit data, and the selection means operates to handle voice unit data which is associated with phonetic data representing a reading matching with the reading of the voice unit composing the sentence, as voice unit whose reading is common with the voice unit.
  - 6. The speech synthesis device according to claim 4, wherein the utterance speed conversion means operates to convert the voice unit data and/or the speech data so as to represent a speech to be uttered at a speed to be produced at a speed satisfying the conditions specified by the utterance speed data, by eliminating a segment representing a phoneme fragment from voice unit data and/or speech data composing data representing the synthesis speech or by adding a segment representing a phoneme fragment to the voice unit data and/or speech data.
  - 7. The speech synthesis device according to claim 6, wherein the voice unit storage means operates to associate phonetic data representing a reading of voice unit with the voice unit data, and the selection means operates to handle voice unit data which is associated with phonetic data representing a reading matching with the reading of the voice unit composing the sentence, as voice unit whose reading is common with the voice unit.
  - 8. The speech synthesis device according to claim 1, wherein the voice unit storage means operates to associate phonetic data representing a reading of voice unit with the voice unit data, and the selection means operates to handle voice unit data which is associated with phonetic data representing a reading matching with the reading of the voice unit composing the sentence, as voice unit whose reading is common with the voice unit.

9. A speech synthesis method performed by a speech synthesis device having storage means and processing means, the method comprising the steps of:
- storing in the storage means a plurality of pieces of voice unit data representing voice units;
  
  storing in the storage means a plurality of pieces of phoneme data each of which is a phoneme or comprises phoneme fragments composing a phoneme;
  
  inputting in the processing means sentence information representing a sentence to predict the cadence of voice units composing the sentence;
  
  selecting, in the processing means, voice units satisfying predetermined conditions out of the plurality of pieces of voice unit data stored in the storage means, wherein the predetermined conditions are that the voice unit data to be selected matches in its reading with the voice unit composing the sentence and has a correlation greater than a predetermined amount with a cadence prediction result;
  
  predicting in the processing means the cadence of voice units which have been decided not to satisfy the predetermined conditions;
  
  in the processing means, specifying phonemes contained in the voice unit decided not to satisfy the predetermined conditions out of the voice units composing the sentence, acquiring phoneme data representing the specified phoneme or phoneme fragments composing the specified phoneme from the storage means, converting the acquired phoneme data so that the phoneme or phoneme fragments represented by the acquired phoneme data matches with a cadence prediction result, and interconnecting the converted data, thereby synthesizing speech data representing a waveform of the voice unit; and
  
  in the processing means, interconnecting the selected voice unit data and the synthesis speed data, thereby creating data representing synthesis speech.
- View Dependent Claims (10)
- - 10. The speech synthesis method according to claim 9, wherein the processing means operates to select the voice unit data out of the plurality of pieces of voice unit data stored in the storage means under the predetermined conditions further including that the presence or absence of nasalization or devocalization of the voice unit data matches with the cadence prediction result.

11. A computer readable medium which records a computer program causing a computer to operate as:
- voice unit storage means for storing a plurality of pieces of voice unit data representing voice units;
  
  phoneme storage means for storing a plurality of pieces of phoneme data each of which is a phoneme or comprises phoneme fragments composing a phoneme;
  
  cadence prediction means for inputting sentence information representing a sentence to predict the cadence of voice, units comprising the sentence;
  
  selecting means for selecting voice unit data satisfying predetermined conditions out of the plurality of pieces of voice unit data stored in the voice unit storage means, wherein the predetermined conditions are that the voice unit data to be selected matches in its reading with the voice unit composing the sentence and has a correlation greater than a predetermined amount with a cadence prediction result by the cadence prediction means;
  
  missing part cadence prediction means for predicting the cadence of voice units which have been decided not to satisfy the predetermined conditions by the selection means;
  
  missing part synthesis means for specifying phonemes contained in the voice unit decided not to satisfy the predetermined condition by the selection means out of the voice units composing the sentence, for acquiring phoneme data representing the specified phoneme or phoneme fragments composing the specified phoneme from the phoneme storage means, for converting the acquired phoneme data so that the phoneme or phoneme fragments represented by the acquired phoneme data matches with a cadence prediction result by the missing part cadence prediction means, and for interconnecting the converted data, thereby synthesizing speech data representing a waveform of the voice unit; and
  
  creation means for interconnecting the voice unit data selected by the selection means and the speech data synthesized by the missing part synthesis means, thereby creating data representing synthesis speech.
- View Dependent Claims (12)
- - 12. The computer readable medium according to claim 11, wherein the selection means selects the voice unit data out of the plurality of pieces of voice unit data stored in the voice unit storage means under the predetermined conditions further including that the presence or absence of nasalization or devocalization of the voice unit data matches with the cadence prediction result by the cadence prediction means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Rakuten Group, Inc.
Original Assignee
Kabushiki Kaisha Kenwood
Inventors
Sato, Yasushi
Primary Examiner(s)
Lerner, Martin

Application Number

US10/559,571
Publication Number

US 20060136214A1
Time in Patent Office

2,952 Days
Field of Search

704/258, 704/260, 704/265, 704/266, 704/267, 704/268, 704/269, 704/263
US Class Current

704/258
CPC Class Codes

G10L 13/027 Concept to speech synthesis...

Speech synthesis for synthesizing missing parts

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis for synthesizing missing parts

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links