SPEECH PROCESSING APPARATUS AND SPEECH SYNTHESIS APPARATUS
First Claim
1. An apparatus for a speech processing, comprising:
- a frame extraction unit configured to extract a speech signal in each frame;
an information extraction unit configured to extract a spectral envelope information of L-dimension from each frame, the spectral envelope information not having a spectral fine structure;
a basis storage unit configured to store N bases (L>
N>
1), each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping; and
a parameter calculation unit configured to minimize a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and to set the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.
4 Assignments
0 Petitions
Accused Products
Abstract
An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.
-
Citations
20 Claims
-
1. An apparatus for a speech processing, comprising:
-
a frame extraction unit configured to extract a speech signal in each frame; an information extraction unit configured to extract a spectral envelope information of L-dimension from each frame, the spectral envelope information not having a spectral fine structure; a basis storage unit configured to store N bases (L>
N>
1), each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping; anda parameter calculation unit configured to minimize a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and to set the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for a speech synthesis, comprising:
-
an acquisition unit configured to acquire a spectral envelope parameter corresponding to a pitch-cycle waveform of each speech unit to be synthesized as a speech, the spectral envelope parameter having L-dimension; a basis storage unit configured to store N bases (L>
N>
1), each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping;an envelope generation unit configured to generate a spectral envelope information by linearly combining the basis with the spectral envelope parameter; a pitch-cycle waveform generation unit configured to generate a plurality of pitch-cycle waveforms by inverse-Fourier transform with a spectral of the spectral envelope information; and a speech generation unit configured to generate a plurality of speech units by overlapping and adding the plurality of pitch-cycle waveforms, and to generate a speech waveform by concatenating the plurality of speech units. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A method for a speech processing, comprising:
-
dividing a speech signal into each frame; extracting a spectral envelope information of L-dimension from each frame, the spectral envelope information not having a spectral fine structure; storing N bases (L>
N>
1) in a memory, each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping;minimizing a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient; and setting the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.
-
-
18. A method for a speech synthesis, comprising:
-
acquiring a spectral envelope parameter corresponding to a pitch-cycle waveform of each speech unit to be synthesized as a speech, the spectral envelope parameter having L-dimension; storing N bases (L>
N>
1) in a memory, each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping;generating a spectral envelope information by linearly combining the basis with the spectral envelope parameter; generating a plurality of pitch-cycle waveforms by inverse-Fourier transform with a spectral of the spectral envelope information; generating a plurality of speech units by overlapping and adding the plurality of pitch-cycle waveforms; and generating a speech waveform by concatenating the plurality of speech units.
-
-
19. A computer program stored in a computer readable medium for causing a computer to perform a method for a speech processing, the method comprising:
-
dividing a speech signal into each frame; extracting a spectral envelope information of L-dimension from each frame, the spectral envelope information not having a spectral fine structure; storing N bases (L>
N>
1) in a memory, each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping;minimizing a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient; and setting the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.
-
-
20. A computer program stored in a computer readable medium for causing a computer to perform a method for a speech synthesis, the method comprising:
-
acquiring a spectral envelope parameter corresponding to a pitch-cycle waveform of each speech unit to be synthesized as a speech, the spectral envelope parameter having L-dimension; storing N bases (L>
N>
1) in a memory, each basis being differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension, a value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain being zero, two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlapping;generating a spectral envelope information by linearly combining the basis with the spectral envelope parameter; generating a plurality of pitch-cycle waveforms by inverse-Fourier transform with a spectral of the spectral envelope information; generating a plurality of speech units by overlapping and adding the plurality of pitch-cycle waveforms; and generating a speech waveform by concatenating the plurality of speech units.
-
Specification