Non-speech section detecting method and non-speech section detecting device
First Claim
1. A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not including voice data based on speech uttered by a person, the device comprising:
- a first calculating part configured to calculate, for each frame of the plurality of frames, a value, wherein the value is one of a power of sound data, a pitch of sound data, or a bias of a spectrum obtained by converting sound data into components on a frequency axis;
a second calculating part configured to calculate, for a pair of consecutive frames, a variation between the calculated values calculated for the frames in the pair and configured to judge whether the calculated variation is smaller than or equal to a given threshold, and performing, for each pair of consecutive frames in the plurality of frames, the calculating of a variation and the judging;
a counting part configured to count a number of variations judged as smaller than or equal to the threshold;
a count judging part configured to judge whether the counted number is greater than or equal to a given value; and
a detecting part configured to detect, when the counted number is judged as greater than or equal to the given value, a section of the sound data as a non-speech section.
0 Assignments
0 Petitions
Accused Products
Abstract
A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.
-
Citations
5 Claims
-
1. A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not including voice data based on speech uttered by a person, the device comprising:
-
a first calculating part configured to calculate, for each frame of the plurality of frames, a value, wherein the value is one of a power of sound data, a pitch of sound data, or a bias of a spectrum obtained by converting sound data into components on a frequency axis; a second calculating part configured to calculate, for a pair of consecutive frames, a variation between the calculated values calculated for the frames in the pair and configured to judge whether the calculated variation is smaller than or equal to a given threshold, and performing, for each pair of consecutive frames in the plurality of frames, the calculating of a variation and the judging; a counting part configured to count a number of variations judged as smaller than or equal to the threshold; a count judging part configured to judge whether the counted number is greater than or equal to a given value; and a detecting part configured to detect, when the counted number is judged as greater than or equal to the given value, a section of the sound data as a non-speech section. - View Dependent Claims (2, 3, 4)
-
-
5. A non-speech section detecting method of generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not including voice data based on speech uttered by a person, the method comprising:
-
calculating, for each frame of the plurality of frames, a value, wherein the value is one of a power of sound data, or a pitch of sound data, or a bias of a spectrum obtained by converting sound data into components on a frequency axis, using a processor; calculating, for a pair of consecutive frames, a variation between the calculated values calculated for the frames in the pair and judging whether the calculated variation is smaller than or equal to a given threshold, and performing, for each pair of consecutive frames in the plurality of frames, the calculating of a variation and the judging using the processor; counting a number of variations judged as smaller than or equal to the threshold using the processor; judging whether the counted number of variations is greater than or equal to a given value using the processor; and detecting, when the counted number of variations is judged as greater than or equal to the given value, a section of the sound data as a non-speech section using the processor.
-
Specification