METHOD, APPARATUS AND PROGRAM FOR SPEECH SYNTHESIS

US 20090204405A1
Filed: 09/04/2006
Published: 08/13/2009
Est. Priority Date: 09/06/2005
Status: Active Grant

First Claim

Patent Images

1-34. -34. (canceled)

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio. The unit waveform re-selection section selects, from the sampling-rate-converted unit waveform, the unit waveform having a phase necessary to obtain a synthesized speech waveform which will exhibit smooth waveform concatenation.

Citations

59 Claims

1-34. -34. (canceled)

35. A speech synthesis apparatus for concatenating a plurality of unit waveforms to generate synthesized speech, said apparatus comprising:
- a conversion section that converts sampling rate of said unit waveform;
  
  a decimation section that decimates the unit waveform that undergoes the conversion of the sampling rate to the sampling rate of a synthesized speech; and
  
  a waveform synthesis section that generates the synthesized speech using the decimated unit waveform;
  
  wherein said conversion section changes the conversion ratio of the sampling rate based on input prosodic information.
- View Dependent Claims (36, 37)
- - 36. The speech synthesis apparatus according to claim 35, wherein said conversion section derives a pitch frequency from the prosodic information and increases the value of said conversion ratio to a higher value when the pitch frequency is of a relatively high value.
  - 37. The speech synthesis apparatus according to claim 35, wherein said conversion section derives a position of pitch synchronization from said pitch frequency and uses the value of the conversion ratio which relatively reduces an error in the position of pitch synchronization.

38. A speech synthesis apparatus comprising:
- a plurality of compressed unit waveform storages which store a plurality of compressed unit waveforms in association with conversion ratio of the sampling rate;
  
  a compressed unit waveform storage selection section that selects one of said compressed unit waveform storages, based on input prosodic information;
  
  a compressed unit waveform selection section that selects the compressed unit waveform from the selected one of said compressed unit waveform storage, based on said prosodic information and phonological information;
  
  a unit waveform decompression section that decompresses said compressed unit waveform to obtain the unit waveform, based on identification information of the selected compressed unit waveform storage; and
  
  a waveform synthesis section that generates the synthesized speech based on said prosodic information and the decompressed unit waveform.
- View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
- - 39. The speech synthesis apparatus according to claim 38, further comprising:
    - a unit waveform storage that stores at least one unit waveform; and
      
      a compressed unit waveform storage generation section that generates, out of the unit waveform in said unit waveform storage, a unit waveform that has a sampling-rate thereof converted to a sampling rate different from the sampling rate of said unit waveform, compresses the so generated sampling-rate-converted unit waveform and stores the compressed sampling-rate-converted unit waveform in said compressed unit waveform storage corresponding to the sampling rate conversion ratio.
  - 40. The speech synthesis apparatus according to claim 39, wherein said compressed unit waveform storage generation section includes:
    - a sampling rate conversion section that generates, from said unit waveform, a unit waveform that has a sampling-rate thereof converted to a sampling rate different from the sampling rate of said unit waveform;
      
      a unit waveform selection section that finds a plurality of unit waveforms, each having a different phase, from said sampling-rate-converted unit waveform; and
      
      a unit waveform compression section that compresses a plurality of said unit waveforms, each having a different phase, to generate a plurality of compressed unit waveforms.
  - 41. The speech synthesis apparatus according to claim 39, further comprising:
    - a compression method selection section that decides on a method for compression in accordance with the phase of the unit waveform.
  - 42. The speech synthesis apparatus according to claim 38, further comprising:
    - a compressed unit waveform storage generation section that generates compressed unit waveforms, stored in a plurality of said compressed unit waveform storages, from a speech waveform having the sampling rate higher than the sampling rate of said unit waveform.
  - 43. The speech synthesis apparatus according to claim 42, wherein said compressed unit waveform storage generation section includes:
    - a unit waveform selection section that finds a plurality of unit waveforms, each having a different phase, from a speech waveform, having a sampling rate higher than the sampling rate of a unit waveform; and
      
      a unit waveform compression section that compresses said unit waveforms, each having a different phase, to generate a plurality of compressed unit waveforms.
  - 44. The speech synthesis apparatus according to claim 43, wherein said unit waveform compression section includes a compression method selection section that selects a method for compression based on a ratio of the sampling rate of said sampling-rate-converted unit waveform to the sampling rate of said unit waveform.
  - 45. The speech synthesis apparatus according to claim 38, wherein, when a non-compressed unit waveform is selected, a unit waveform is generated by sampling rate conversion and, when a compressed unit waveform is input, the compressed unit waveform is decompressed by said unit waveform decompression section to generate a unit waveform.
  - 46. The speech synthesis apparatus according to claim 38, further comprising:
    - a unit waveform storage that stores a variety of unit waveforms needed for generating the synthesized speech and the attribute information of the unit waveforms;
      
      a compressed unit waveform storage generation section that processes and compresses the unit waveforms supplied from said unit waveform storage and that stores the compressed unit waveforms in the compressed unit waveform storage selected out of a plurality of said compressed unit waveform storages;
      
      a pitch frequency calculation section that computes the pitch frequency from the prosodic information;
      
      a pitch synchronization position calculation section that computes position of pitch synchronization, based on the pitch frequency supplied from said pitch frequency calculation section; and
      
      a compressed unit waveform storage selection section that computes a sampling rate conversion ratio, based on the pitch frequency supplied from the pitch frequency calculation section and on the position of pitch synchronization supplied from said pitch synchronization position calculation section, and selects the compressed unit waveform storage matched to the computed conversion ratio;
      
      wherein said compressed unit waveform selection section selects one of the compressed unit waveforms registered in the compressed unit waveform storage selected by said compressed unit waveform storage selection section, based on prosodic information, phonological information, pitch information supplied from said pitch frequency calculation section and the position of pitch synchronization supplied from said pitch synchronization position calculation section;
      
      said unit waveform decompression section decompresses the compressed unit waveform supplied from said compressed unit waveform selection section into a unit waveform; and
      
      said waveform synthesis section places and connects unit waveforms supplied from said unit waveform re-selection section on the position of pitch synchronization supplied from said pitch synchronization position calculation section to synthesize a waveform;
      
      said waveform synthesis section outputting a synthesized speech signal.
  - 47. The speech synthesis apparatus according to claim 46, wherein said compressed unit waveform storage generation section includes:
    - a conversion ratio control section that outputs a plurality of values of the conversion ratio for a sole unit waveform supplied to said compressed unit waveform storage generation section;
      
      a sampling rate conversion section that converts, with the conversion ratio supplied from said conversion ratio control section, the sampling rate of the sole unit waveform supplied;
      
      a unit waveform selection section that selects the unit waveform having the phase unregistered in said compressed unit waveform storage, out of the sampling-rate-converted unit waveforms generated by said sampling rate conversion section, as said unit waveform selection section references the conversion ratio supplied from said conversion ratio control section;
      
      a compression method selection section that decides on a method for compression, by referencing the conversion ratio supplied from said conversion ratio control section, and outputs information on the method for compression;
      
      a unit waveform compression section that compresses the unit waveform, supplied from said unit waveform selection section, based on the information on the compression method selected by said compression method selection section, and outputs the compressed unit waveform to the compressed unit waveform storage selection section; and
      
      a compressed unit waveform storage selection section that selects one of a plurality of said compressed unit waveform storages, by referencing the conversion ratio supplied from said conversion ratio control section, and outputs the compressed unit waveform, supplied from said unit waveform compression section, to said compressed unit waveform storage selected.
  - 48. The speech synthesis apparatus according to claim 42, wherein said compressed unit waveform storage generation section includes:
    - a high sampling rate unit waveform storage that stores a unit waveform sampled at a sampling rate higher than the sampling rate for the synthesized speech;
      
      a sampling rate storage that stores the sampling rate of a unit waveform registered in said high sampling rate unit waveform storage;
      
      a filter that receives the high sampling rate unit waveform, supplied from said high sampling rate unit waveform storage, said filter having a passband which is the same band as that for the synthesized speech;
      
      a unit waveform read position control section that decides on a position for reading the unit waveform having the same sampling rate as the sampling rate for the synthesized speech, from the high sampling rate unit waveform, by referencing the sampling rate stored in said sampling rate storage;
      
      a unit waveform selection section that adjusts the waveform read position of an output waveform of said filter, and samples said output waveform with the same sampling width as the sampling width of said unit waveform to generate a plurality of unit waveforms each having a different phase;
      
      a compression method selection section that decides on a method for compression, depending on the read position information output from said unit waveform read position control section, to output the information on the method for compression;
      
      a unit waveform compression section that compresses the unit waveform, supplied from said unit waveform selection section, based on the information on the compression method selected by said compression method selection section, to output the compressed unit waveform; and
      
      a compressed unit waveform storage selection section that selects one of a plurality of said compressed unit waveform storages, depending on the read position information output from said unit waveform read position control section, and outputs the compressed unit waveform, supplied from said unit waveform compression section, to said compressed unit waveform storage.
  - 49. The speech synthesis apparatus according to claim 46, further comprising:
    - a conversion ratio computing section that decides on the sampling rate conversion ratio, based on the pitch frequency supplied from said pitch frequency calculation section, and on the position of pitch synchronization supplied from said pitch synchronization position calculation section;
      
      a sampling rate conversion section that generates, from the unit waveform supplied from said unit waveform selection section, a unit waveform, the sampling rate of which has been converted to a value different from the sampling rate of said unit waveform, in accordance with the conversion ratio supplied from said conversion ratio computing section;
      
      a unit waveform re-selection section that selects a unit waveform, out of the sampling-rate-converted unit waveforms, supplied from said sampling rate conversion section, based on the position of pitch synchronization supplied from said pitch synchronization position calculation section; and
      
      a waveform generation processing switching section that determines, based on the identification information for the unit waveform storage, selected by said unit waveform storage selection section, whether the unit waveform supplied from said compressed unit waveform selection section is a compressed waveform or a non-compressed waveform;
      
      said waveform generation processing switching section outputting a unit waveform to said sampling rate conversion section if a non-compressed waveform is entered as an input;
      
      said waveform generation processing switching section outputting a compressed unit waveform to said unit waveform decompression section, if a compressed waveform is entered as an input.

50. A speech synthesis method for concatenating a plurality of unit waveforms to generate synthesized speech;
- said method comprising;
  
  a step of performing conversion that increases sampling rate of said unit waveform;
  
  a step of decimating the unit waveform that undergoes the conversion of the sampling rate to the sampling rate of a synthesized speech; and
  
  a step of generating the synthesized speech using the decimated unit waveform;
  
  wherein said step of performing conversion changes the conversion ratio of the sampling rate based on input prosodic information.
- View Dependent Claims (51, 52)
- - 51. The speech synthesis method according to claim 50, wherein said step of performing the conversion finds pitch frequency from the prosodic information and increases the value of said conversion ratio to a higher value in case of a higher value of the pitch frequency.
  - 52. The speech synthesis method according to claim 51, wherein said step of performing the conversion finds position of pitch synchronization from said pitch frequency and uses the value of the conversion ratio which reduces an error in the position of pitch synchronization to a smaller value.

53. A speech synthesis method comprising:
- a step of generating a plurality of compressed unit waveforms from a unit waveform storage in which unit waveforms are stored, and storing said compressed unit waveforms in a plurality of compressed unit waveform storages;
  
  a step of selecting one of said compressed unit waveform storages, based on the prosodic information;
  
  a step of selecting a compressed unit waveform, from the compressed unit waveform storage selected, based on the prosodic information and the phonological information;
  
  a step of decompressing the compressed unit waveform, based on the identification information of said unit waveform storage selected, to derive a unit waveform; and
  
  a step of generating the synthesized speech from said prosodic information and the decompressed unit waveform.
- View Dependent Claims (54)
- - 54. The speech synthesis method according to claim 53, further comprising:
    - a step of generating a plurality of compressed unit waveform storages from the speech waveform the sampling rate of which is higher than the sampling rate of the unit waveform.

55. A program causing a computer, constituting a speech synthesis apparatus, to execute the processing of concatenating unit waveforms to generate a synthesized speech;
- wherein said program executes;
  
  the processing of performing conversion that increases sampling rate of said unit waveform and changes the conversion ratio of the sampling rate based on input prosodic information;
  
  the processing of decimating the unit waveform that undergoes the conversion of the sampling rate to the sampling rate of a synthesized speech; and
  
  the processing of generating the synthesized speech using the decimated unit waveform.
- View Dependent Claims (56, 57)
- - 56. The program according to claim 55, wherein said processing of performing the conversion finds pitch frequency from said prosodic information and increases the value of said conversion ratio to a higher value in case of a higher value of the pitch frequency.
  - 57. The program according to claim 56, wherein said processing of performing the conversion finds position of pitch synchronization from said pitch frequency and uses the value of the conversion ratio which reduces an error in the position of pitch synchronization to a smaller value.

58. A program causing a computer, constituting a speech synthesis apparatus, to execute:
- the processing of generating a plurality of compressed unit waveforms from a unit waveform storage in which unit waveforms are stored, and storing said compressed unit waveforms in a plurality of compressed unit waveform storages;
  
  the processing of selecting, based on the prosodic information, one of said compressed unit waveform storages;
  
  the processing of selecting a compressed unit waveform, from the compressed unit waveform storage selected, based on prosodic information and phonological information;
  
  the processing of decompressing the compressed unit waveform, based on the identification information of said unit waveform storage selected, to derive a unit waveform; and
  
  the processing of generating the synthesized speech from said prosodic information and the decompressed unit waveform.
- View Dependent Claims (59)
- - 59. The program according to claim 58, wherein the program causes the computer to further executethe processing of generating a plurality of compressed unit waveform storages from a speech waveform the sampling rate of which is higher than the sampling rate of the unit waveform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Tsukada, Satoshi, Kato, Masanori

Granted Patent

US 8,165,882 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/268
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 25/90 Pitch determination of spee...

METHOD, APPARATUS AND PROGRAM FOR SPEECH SYNTHESIS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

59 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD, APPARATUS AND PROGRAM FOR SPEECH SYNTHESIS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

59 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links