Speech synthesis device handling phoneme units of extended CV

US 6,847,932 B1
Filed: 09/28/2000
Issued: 01/25/2005
Est. Priority Date: 09/30/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A speech synthesis device comprising:

speech database storing means for storing a speech database created by way of dividing the sample speech waveform data obtained from recording human speech utterances into speech units, and associating the sample waveform data in each speech unit with their corresponding phonetic information;

speech waveform composing means for dividing phonetic information into speech units upon receiving the phonetic information of speech sound to be synthesized, for obtaining sample speech waveform data from the speech database corresponding to the phonetic information in a speech unit, and for generating speech waveform data to be composed by means of concatenating the sample speech waveform data in the speech unit; and

analog converting means for converting the speech waveform data received from the speech waveform composing means into analog signals;

wherein the speech database storing means divides the sample speech waveform data into speech units of Extended CV, which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels;

wherein the speech waveform composing means divides the phonetic information into speech units of Extended CV;

wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “

0”

, and a syllable weight of V, R, J, Q and N to be “

1”

.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Given phonetic information is divided into speech units of extended CV which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels. Contour of vocal tract transmission function of phoneme of the speech unit of extended CV is obtained from the phoneme directory which contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of extended CV. Speech waveform data is generated based on the contour of vocal tract transmission function of phoneme of the speech unit of extended CV. Speech waveform data is converted into analog voice signal.

Citations

20 Claims

1. A speech synthesis device comprising:
- speech database storing means for storing a speech database created by way of dividing the sample speech waveform data obtained from recording human speech utterances into speech units, and associating the sample waveform data in each speech unit with their corresponding phonetic information;
  
  speech waveform composing means for dividing phonetic information into speech units upon receiving the phonetic information of speech sound to be synthesized, for obtaining sample speech waveform data from the speech database corresponding to the phonetic information in a speech unit, and for generating speech waveform data to be composed by means of concatenating the sample speech waveform data in the speech unit; and
  
  analog converting means for converting the speech waveform data received from the speech waveform composing means into analog signals;
  
  wherein the speech database storing means divides the sample speech waveform data into speech units of Extended CV, which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels;
  
  wherein the speech waveform composing means divides the phonetic information into speech units of Extended CV;
  
  wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “
  
  0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

2. A speech synthesis device comprising:
- speech database storing means for storing a speech database created by way of dividing the sample speech waveform data obtained from recording human speech utterances into speech units, and associating the sample waveform data in each speech unit with their corresponding phonetic information;
  
  speech waveform composing means for dividing phonetic information into speech units upon receiving the phonetic information of speech sound to be synthesized, for obtaining sample speech waveform data from the speech database corresponding to the phonetic information in a speech unit, and for generating speech waveform data to be composed by means of concatenating the sample speech waveform data in the speech unit; and
  
  analog converting means for converting the speech waveform data received from the speech waveform composing means into analog signals;
  
  wherein the speech database storing means divides the sample speech waveform data into speech units of Extended CV, which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels;
  
  wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C)(y) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong, R is a latter part of a long vowel, J is the second element of a diphthong, Q is a geminated sound, and N is a syllabic nasal.
- View Dependent Claims (3)
- - 3. The speech synthesis device of claim 2, wherein the Extended CV further includes a superheavy syllable with a syllable weight of “
    - 3”
      
      such as (C)(y) VRN, (C)(y) VRQ, (C)(y) VJN, (C)(y) VJQ and (C)(y) VNQ, andwherein the heavy syllable is given a higher priority than the light syllable and the superheavy syllable takes precedence over the heavy syllable for being selected as Extended CV.

4. A computer-readable storing medium for storing a program for executing speech synthesis by means of a computer using a speech database constructed with sample speech waveform data associated with its corresponding phonetic information, the program comprising the steps of:
- dividing phonetic information into Extended CVs upon receiving the phonetic information of speech sound to be synthesized;
  
  obtaining sample speech waveform data corresponding to the divided phonetic information in Extended CV from the speech database; and
  
  generating speech waveform data to be composed by means of concatenating the sample speech waveform data in Extended CV;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “
  
  0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

5. A computer-readable storing medium for storing a program for executing speech synthesis by means of a computer using a speech database constructed with sample speech waveform data associated with its corresponding phonetic information, the program comprising the steps of:
- dividing phonetic information into Extended CVs upon receiving the phonetic information of speech sound to be synthesized;
  
  obtaining sample speech waveform data corresponding to the divided phonetic information in Extended CV from the speech database; and
  
  generating speech waveform data to be composed by means of concatenating the sample speech waveform data in Extended CV;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (Cy) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong R is a latter part of a long vowel, J is the second element of a diphthong, Q is a geminated sound, and N is a syllabic nasal.
- View Dependent Claims (6)
- - 6. The computer-readable storage medium of claim 5, wherein the Extended CV further includes a superheavy syllable with a syllable weight of “
    - 3”
      
      such as (C)(y) VRN, (C)(y) VRQ, (C)(y) VJN, (C)(y) VJQ and (C)(y) VNQ, andwherein the heavy syllable is given a higher priority than the light syllable and the superheavy syllable takes precedence over the heavy syllable for being selected as Extended CV.

7. A speech synthesis device comprising:
- dividing means for dividing the phonetic information into Extended CVs upon receiving the phonetic information of speech sound to be synthesized;
  
  speech waveform composing means for generating speech waveform data in a unit of Extended CV divided with the dividing means, and for obtaining speech waveform data to be composed by means of concatenating the speech waveform data in a unit of each Extended CV; and
  
  analog converting means for converting the speech waveform data provided from the speech waveform composing means into analog signals of speech sound;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “
  
  0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

8. A speech synthesis device comprising:
- dividing means for dividing the phonetic information into Extended CVs upon receiving the phonetic information of speech sound to be synthesized;
  
  speech waveform composing means for generating speech waveform data in a unit of Extended CV divided with the dividing means, and for obtaining speech waveform data to be composed by means of concatenating the speech waveform data in a unit of each Extended CV; and
  
  analog converting means for converting the speech waveform data provided from the speech waveform composing means into analog signals of speech sound;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C)(y) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a germinated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong, R is a latter part of a long vowel, J is the second element of a diphthongs, Q is a geminated sound, and N is a syllabic nasal.
- View Dependent Claims (9)
- - 9. The speech synthesis device of claim 8, wherein the Extended CV further includes a superheavy syllable with a syllable weight of “
    - 3”
      
      such as (C)(y) VRN, (C)(y) VRQ, (C)(y) VJN, (C)(y) VJQ and (C)(y) VNQ, andwherein the heavy syllable is given a higher priority than the light syllable and the superheavy syllable takes precedence over the heavy syllable for being selected as Extended CV.

10. A computer-readable storing medium for storing a program for executing speech synthesis using a computer, the program comprising the steps of:
- dividing phonetic information into Extended CVs upon receiving the phonetic information of speech sound to be synthesized;
  
  generating speech waveform data in a unit of Extended CV; and
  
  obtaining speech waveform data to be composed by means of concatenating the speech waveform data in a unit of each Extended CV;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “
  
  0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

11. A computer-readable storing medium for storing a program for executing speech synthesis using a computer, the program comprising the steps of:
- dividing phonetic information into Extended CVs upon receiving the phonetic information of speech sound to be synthesized;
  
  generating speech waveform data in a unit of Extended CV; and
  
  obtaining speech waveform data to be composed by means of concatenating the speech waveform data in a unit of each Extended CV;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C)(y) VN and (C) (y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong, R is a latter part of a lone vowel, J is the second element of a diphthong, Q is a geminated sound, and N is a syllabic nasal.
- View Dependent Claims (12)
- - 12. The computer-readable storing medium of claim 11, wherein the Extended CV further includes a superheavy syllable with a syllable weight of “
    - 3”
      
      such as (C)(y) VRN, (C)(y) VRQ, (C)(y) VJN, (C)(y) VJQ and (C)(y) VNQ, andwherein the heavy syllable is given a higher priority than the light syllable and the superheavy syllable takes precedence over the heavy syllable for being selected as Extended CV.

13. A computer-readable storing medium for storing a program for executing dividing process using a computer, the program comprising the step of:
- dividing phonetic information into Extended CVs defined as follows, upon receiving the phonetic information;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “
  
  0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

14. A computer-readable storing medium for storing a program for executing dividing process using a computer, the program comprising the step of:
- dividing phonetic information into Extended CVs defined as follows, upon receiving the phonetic information;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C) (y) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong, R is a latter part of a long vowel, J is the second element of a diphthong, Q is a germinated sound, and N is a syllabic nasal.

15. A computer-readable storing medium for storing a speech database, the database comprising:
- a waveform data area that stores sample speech waveform data divided into Extended CV; and
  
  a phonetic information area that stores the phonetic information associated with sample speech waveform data in a unit of each Extended CV;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a germinated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV assuming the syllable weight of C and y to be “
  
  0” and
  
  a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

16. A computer-readable storing medium for storing a speech database, the database comprising:
- a waveform data area that stores sample speech waveform data divided into Extended CV; and
  
  a phonetic information area that stores the phonetic information associated with sample speech waveform data in a unit of each Extended CV;
  
  wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C) (y) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong, R is a latter part of a long vowel, J is the second element of a diphthong, Q is a germinated sound, and N is a syllabic nasal.

17. A computer-readable storing medium for storing phonetic information data to be used for speech, processing,wherein the phonetic, information data is characterized by being handled in a unit of Extended CV provided with division information per Extended CV, wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV assuming the syllable weight of C and y to be “
- 0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

18. A computer-readable storing medium for storing phonetic information data to be used for speech processing,wherein the phonetic information data is characterized by being handled in a unit of Extended CV provided with division information per Extended CV, and wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
- 2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C)(y) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CV, wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a long vowel and a second element of a diphthong, R is a latter part of a long vowel, J is the second element of a diphthong, Q is a geminated sound, and N is a syllabic nasal.

19. A computer-readable storing medium for storing a phoneme dictionary to be used for speech processing,wherein the phoneme dictionary contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a ma unit of Extended CV, wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV contains at least one of a consonant C excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, a semi vowel y, a vowel V excluding a latter part of a long vowel and a second element of a diphthong, a latter part of a long vowel R, the second element of a diphthong J, a geminated sound Q, and a syllabic nasal N, and wherein the phoneme sequence with heavier syllable weight is selected first as the Extended CV, assuming the syllable weight of C and y to be “
- 0”
  
  , and a syllable weight of V, R, J, Q and N to be “
  
  1”
  
  .

20. A computer-readable storing medium for storing a phoneme dictionary to be used for speech processing,wherein the phoneme dictionary contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of Extended CV;
- wherein the Extended CV refers to a contiguous sequence of phonemes without clear distinction containing at least one vowel, wherein the Extended CV includes at least a heavy syllable with a syllable weight of “
  
  2”
  
  selected from a group consisting of (C)(y) VR, (C)(y) VJ, (C)(y) VN and (C)(y) VQ and a light syllable with the syllable weight of “
  
  1”
  
  as defined by (C)(y) V, wherein the heavy syllable is given a higher priority than the light syllable for being selected as Extended CM, and wherein (C) denotes that C or some Cs are attached to V, wherein (y) denotes whether y or ys are attached to V, and wherein C is a consonant excluding a geminated sound (Japanese SOKUON), a semi vowel, and a syllabic nasal, y is a semi vowel, V is a vowel excluding a latter part of a lone vowel and a second element of a diphthong, R is a latter part of a long vowel, J is the second element of a diphthong, Q is a geminated sound, and N is a syllabic nasal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Arcadia Group Limited (Taveta Limited)
Original Assignee
Arcadia Group Limited (Taveta Limited)
Inventors
Ashimura, Kazuyuki, Tenpaku, Seiichi
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/671,683
Time in Patent Office

1,580 Days
Field of Search

704/266, 704/254, 704/260, 704/258, 704/267
US Class Current

704/266
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

Speech synthesis device handling phoneme units of extended CV

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis device handling phoneme units of extended CV

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links