Adaptive speech rate conversion without extension of input data duration, using speech interval detection
First Claim
1. A speech interval detecting method comprising the steps of:
- calculating a frame power of an input signal data in unit of predetermined frame width at a predetermined time interval, and then holding a maximum value and a minimum value of the frame power within a past predetermined time period;
deciding a threshold value for power changed according to the maximum value being held and difference between the maximum value and the minimum value; and
comparing the threshold value with power of a current frame to decide whether or not the current frame belongs to a speech interval or a non-speech interval.
0 Assignments
0 Petitions
Accused Products
Abstract
When a delivered speed of a listening speech (speech speed) is slowed down, a connection order generator (8) always monitors a data length of input speech, an output data length calculated previously by a conversion function concerning a preset scaling factor, and a data length of actual output speech in predetermined processing unit, then decides connection order not to cause inconsistency among them. The speech data and the connection data are connected without omission of speech information by controlling a speech data connector (9). When power of an input signal data is calculated to discriminate a speech interval and a non-speech interval, a threshold value for power is decided according to a maximum value of the power and difference between the maximum value and a minimum value.
13 Citations
16 Claims
-
1. A speech interval detecting method comprising the steps of:
-
calculating a frame power of an input signal data in unit of predetermined frame width at a predetermined time interval, and then holding a maximum value and a minimum value of the frame power within a past predetermined time period;
deciding a threshold value for power changed according to the maximum value being held and difference between the maximum value and the minimum value; and
comparing the threshold value with power of a current frame to decide whether or not the current frame belongs to a speech interval or a non-speech interval. - View Dependent Claims (2)
-
-
3. A speech interval detecting device comprising:
-
a power calculator (32) for calculating a frame power of an input signal data in unit of predetermined frame width at a predetermined time interval;
an instantaneous power maximum value latch (33) for holding a maximum value of the frame power within a past predetermined time period;
an instantaneous power minimum value latch (34) for holding a minimum value of the frame power within the past predetermined time period;
a power threshold value decision portion (35) for deciding a threshold value for power changed according to the maximum value being held in the instantaneous power maximum value latch and difference between the maximum value and the minimum value being held in the instantaneous power minimum value latch; and
a discriminator (36) for comparing the threshold value obtained by the power threshold value decision portion with power of a current frame to decide whether or not the current frame belongs to a speech interval or a non-speech interval. - View Dependent Claims (4)
-
-
5. A speech speed converting method comprising the steps of:
reducing an extension time of output data with respect to input data by any time period within the extension time when non-speech intervals appears in the output data obtained by extending/synthesizing the input data at any time-changing ratio and also a continued time of the non-speech intervals exceeds a predetermined threshold value. - View Dependent Claims (6, 7, 8, 9, 10)
-
11. A speech speed converting device comprising:
-
a split processing/connection data generating means for generating block data by splitting input data into block data, and then generating connection data based on respective block data; and
a connection processing means for deciding connection order of respective block data generated by the split processing/connection data generating means and connection data based on desired speech speed being input, and then connecting respective block data and the connection data to generate output data;
wherein the connection processing means reduces an extension time of output data with respect to input data by any time period within the extension time when non-speech intervals appears in the output data obtained by extending/synthesizing the input data at any time-changing ratio and also a continued time of the non-speech intervals exceeds a predetermined threshold value. - View Dependent Claims (12, 13, 14, 15, 16)
-
Specification