Speech processing device and method
First Claim
Patent Images
1. A speech processing device comprising:
- a processor; and
a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute;
obtaining input speech, the input speech including a plurality of vowel segments and a plurality of consonant segments,detecting the vowel segments contained in the input speech,estimating a stress segment among the plurality of vowel segments by comparing pitch variation rate or power variation rate per unit time of the plurality of vowel segments, respectively, the stress segment being a segment that has a strong trend of decrease in the pitch variation rate or the power variation rate per unit time,detecting sound lengths of each of the plurality of vowel segments,transforming the input speech so that a first sound length becomes longer than each of second sound lengths when the input speech includes at least one of the second sound lengths that is longer than the first sound length, the first sound length being a sound length of a vowel segment containing the stress segment, the second sound lengths being sound lengths of vowel segments excluding the stress segment, the transforming including extending the first sound length or shortening at least one of the second sound lengths, the first sound length being extended by inserting a part of segment obtained based on the vowel segment containing the stress segment into the vowel segment containing the stress segment, the at least one of the second sound lengths being shortened by deleting a part of segment from the at least one of the second sound lengths, a length to be inserted or to be shortened being determined based on the detected first sound length and the detected second sound length and a prescribed target scaling factor, andoutputting the transformed input speech in which the first sound length is extended or in which the at least one of the second sound lengths is shortened.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute: obtaining input speech, detecting a vowel segment contained in the input speech, estimating an accent segment contained in the input speech, calculating a first vowel segment length containing the accent segment and a second vowel segment length excluding the accent segment, and controlling at least one of the first vowel segment length and the second vowel segment length.
17 Citations
14 Claims
-
1. A speech processing device comprising:
-
a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute; obtaining input speech, the input speech including a plurality of vowel segments and a plurality of consonant segments, detecting the vowel segments contained in the input speech, estimating a stress segment among the plurality of vowel segments by comparing pitch variation rate or power variation rate per unit time of the plurality of vowel segments, respectively, the stress segment being a segment that has a strong trend of decrease in the pitch variation rate or the power variation rate per unit time, detecting sound lengths of each of the plurality of vowel segments, transforming the input speech so that a first sound length becomes longer than each of second sound lengths when the input speech includes at least one of the second sound lengths that is longer than the first sound length, the first sound length being a sound length of a vowel segment containing the stress segment, the second sound lengths being sound lengths of vowel segments excluding the stress segment, the transforming including extending the first sound length or shortening at least one of the second sound lengths, the first sound length being extended by inserting a part of segment obtained based on the vowel segment containing the stress segment into the vowel segment containing the stress segment, the at least one of the second sound lengths being shortened by deleting a part of segment from the at least one of the second sound lengths, a length to be inserted or to be shortened being determined based on the detected first sound length and the detected second sound length and a prescribed target scaling factor, and outputting the transformed input speech in which the first sound length is extended or in which the at least one of the second sound lengths is shortened. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A speech processing method comprising:
-
obtaining input speech, the input speech including a plurality of vowel segments and a plurality of consonant segments, detecting the vowel segments contained in the input speech, estimating a stress segment among the plurality of vowel segments by comparing pitch variation rate or power variation rate per unit time of the plurality of vowel segments, respectively, the stress segment being a segment that has a strong trend of decrease in the pitch variation rate or the power variation rate per unit time, detecting sound lengths of each of the plurality of vowel segments, transforming the input speech so that a first sound length becomes longer than each of second sound lengths when the input speech includes at least one of the second sound lengths that is longer than the first sound length, the first sound length being a sound length of a vowel segment containing the stress segment, the second sound lengths being sound lengths of vowel segments excluding the stress segment, the transforming including extending the first sound length or shortening at least one of the second sound lengths, the first sound length being extended by inserting a part of segment obtained based on the vowel segment containing the stress segment into the vowel segment containing the stress segment, the at least one of the second sound lengths being shortened by deleting a part of segment from the at least one of the second sound lengths, a length to be inserted or to be shortened being determined based on the detected first sound length and the detected second sound length and a prescribed target scaling factor, and outputting the transformed input speech in which the first sound length is extended or in which the at least one of the second sound lengths is shortened. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing a speech processing program that causes a computer to execute a process comprising:
-
obtaining input speech, the input speech including a plurality of vowel segments and a plurality of consonant segments, detecting the vowel segments contained in the input speech, estimating a stress segment among the plurality of vowel segments by comparing pitch variation rate or power variation rate per unit time of the plurality of vowel segments, respectively, the stress segment being a segment that has a strong trend of decrease in the pitch variation rate or the power variation rate per unit time, detecting sound lengths of each of the plurality of vowel segments, transforming the input speech so that a first sound length becomes longer than each of second sound lengths when the input speech includes at least one of the second sound lengths that is longer than the first sound length, the first sound length being a sound length of a vowel segment containing the stress segment, the second sound lengths being sound lengths of vowel segments excluding the stress segment, the transforming including extending the first sound length or shortening at least one of the second sound lengths, the first sound length being extended by inserting a part of segment obtained based on the vowel segment containing the stress segment into the vowel segment containing the stress segment, the at least one of the second sound lengths being shortened by deleting a part of segment from the at least one of the second sound lengths, a length to be inserted or to be shortened being determined based on the detected first sound length and the detected second sound length and a prescribed target scaling factor, and outputting the transformed input speech in which the first sound length is extended or in which the at least one of the second sound lengths is shortened.
-
-
14. A portable terminal device comprising:
-
a microphone that inputs a speaker'"'"'s voice as input speech; a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute; obtaining the input speech, the input speech including a plurality of vowel segments and a plurality of consonant segments, detecting the vowel segments contained in the input speech, estimating a stress segment among the plurality of vowel segments by comparing pitch variation rate or power variation rate per unit time of the plurality of vowel segments, respectively, the stress segment being a segment that has a strong trend of decrease in the pitch variation rate or the power variation rate per unit time, detecting sound lengths of each of the plurality of vowel segments, transforming the input speech so that a first sound length becomes longer than each of second sound lengths when the input speech includes at least one of the second sound lengths that is longer than the first sound length, the first sound length being a sound length of a vowel segment containing the stress segment, the second sound lengths being sound lengths of vowel segments excluding the stress segment, the transforming including extending the first sound length or shortening at least one of the second sound lengths, the first sound length being extended by inserting a part of segment obtained based on the vowel segment containing the stress segment into the vowel segment containing the stress segment, the at least one of the second sound lengths being shortened by deleting a part of segment from the at least one of the second sound lengths, a length to be inserted or to be shortened being determined based on the detected first sound length and the detected second sound length and a prescribed target scaling factor, and outputting the transformed input speech in which the first sound length is extended or in which the at least one of the second sound lengths is shortened, a speaker configured to output an output speech generated by controlling the input speech.
-
Specification