Speech detecting device and speech detecting method
First Claim
1. A voice activity detecting device comprising:
- a speech-segment inferring section for determining, for each of active voice frames as an aural signal given in order of time sequence, a probability that the active voice frame belongs to an active voice segment, the determining being made based on a statistical characteristic of the aural signal;
a quality monitoring section for monitoring quality of the aural signal for each of the active voice frames; and
a speech-segment determining section for determining, for each of the active voice frames as an aural signal given in order of time sequence, an accuracy that the active voice frame belongs to an active voice segment by weighting the probability determined by said speech-segment inferring section with the quality monitored by said quality monitoring section.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to a voice activity detecting device and a voice activity detecting method. An object of the invention is to adapt to various characteristics of noise which may possibly be superimposed on an aural signal to thereby reliably discriminate between an active voice segment and a non-active voice segment. For this purpose, the voice activity detecting device comprises: a speech-segment inferring section 11 for determining the probability that each of active voice frames given in order of time sequence belongs to the active voice segment, based on the statistical characteristic of the aural signal; a quality monitoring section 12 for monitoring the quality of the aural signal for each active voice frame, and a speech-segment determining section 13 for weighting the determined probability with the above quality to obtain for each active voice frame the accuracy that the active voice frame belongs to the active voice segment.
-
Citations
33 Claims
-
1. A voice activity detecting device comprising:
-
a speech-segment inferring section for determining, for each of active voice frames as an aural signal given in order of time sequence, a probability that the active voice frame belongs to an active voice segment, the determining being made based on a statistical characteristic of the aural signal;
a quality monitoring section for monitoring quality of the aural signal for each of the active voice frames; and
a speech-segment determining section for determining, for each of the active voice frames as an aural signal given in order of time sequence, an accuracy that the active voice frame belongs to an active voice segment by weighting the probability determined by said speech-segment inferring section with the quality monitored by said quality monitoring section. - View Dependent Claims (4, 7, 10, 13, 16, 19, 22, 25, 28)
said quality monitoring section determines a feature of the active voice segment of the aural signal and/or a feature of the non-active voice segment of the aural signal to obtain the quality of the aural signal as one of the features or a difference between the features. -
7. The voice activity detecting device according to claim 1, wherein
said quality monitoring section determines assessed noise-power for each of the active voice frames to obtain the quality of the aural signal as a monotone nonincreasing function of the assessed noise-power. -
10. The voice activity detecting device according to claim 1, wherein
said quality monitoring section determines, for each of the active voice frames, assessed noise-power and an assessed value of an SN ratio to obtain the quality of the aural signal as a monotone nonincreasing function and a monotone nondecreasing function, respectively. -
13. The voice activity detecting device according to claim 1, wherein
said quality monitoring section determines a standardized random variable for each of the active voice frames to obtain the quality of the aural signal as a monotone decreasing function of the standardized random variable. -
16. The voice activity detecting device according to claim 1, wherein
said quality monitoring section determines, for each of the active voice frames, a standardized random variable and an assessed value of an SN ratio to obtain the quality of the aural signal as a monotone nonincreasing function and a monotone nondecreasing function, respectively. -
19. The voice activity detecting device according to claim 7, wherein
said quality monitoring section determines a peak value of instantaneous values of the aural signal contained in each of the active voice frames; - and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
determines a standardized random variable as a ratio of the amplitude to the peak value.
- and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
-
22. The voice activity detecting device according to claim 10, wherein
said quality monitoring section determines a peak value of instantaneous values of the aural signal contained in each of the active voice frames; - and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
determines a standardized random variable as a ratio of the amplitude to the peak value.
- and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
-
25. The voice activity detecting device according to claim 1, wherein
said quality monitoring section integrates the monitored quality of the aural signal in sequence to apply the resultant as normal quality. -
28. The voice activity detecting device according to claim 1, wherein
said quality monitoring section integrates the monitored quality of the aural signal in sequence to apply as quality a value which is obtained as a monotone increasing function or a monotone nondecreasing function of the resultant.
-
-
2. A voice activity detecting device comprising:
-
a speech-segment determining section for determining, for each of active voice frames as an aural signal given in order of time sequence, an accuracy that the active voice frame belongs to an active voice segment, the determining being made based on a statistical characteristic of the aural signal; and
a quality monitoring section for monitoring quality of the aural signal for each of the active voice frames, and wherein said speech-segment determining section weights a sequence of instantaneous values of the aural signal contained in each of the active voice frames by a weighting given as a monotone decreasing function or a monotone nonincreasing function of the quality monitored by said quality monitoring section. - View Dependent Claims (5, 8, 11, 14, 17, 20, 23, 26, 29)
said quality monitoring section determines a feature of the active voice segment of the aural signal and/or a feature of the non-active voice segment of the aural signal to obtain the quality of the aural signal as one of the features or a difference between the features. -
8. The voice activity detecting device according to claim 2, wherein
said quality monitoring section determines assessed noise-power for each of the active voice frames to obtain the quality of the aural signal as a monotone nonincreasing function of the assessed noise-power. -
11. The voice activity detecting device according to claim 2, wherein
said quality monitoring section determines, for each of the active voice frames, assessed noise-power and an assessed value of an SN ratio to obtain the quality of the aural signal as a monotone nonincreasing function and a monotone nondecreasing function, respectively. -
14. The voice activity detecting device according to claim 2, wherein
said quality monitoring section determines a standardized random variable for each of the active voice frames to obtain the quality of the aural signal as a monotone decreasing function of the standardized random variable. -
17. The voice activity detecting device according to claim 2, wherein
said quality monitoring section determines, for each of the active voice frames, a standardized random variable and an assessed value of an SN ratio to obtain the quality of the aural signal as a monotone nonincreasing function and a monotone nondecreasing function, respectively. -
20. The voice activity detecting device according to claim 8, wherein
said quality monitoring section determines a peak value of instantaneous values of the aural signal contained in each of the active voice frames; - and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
determines a standardized random variable as a ratio of the amplitude to the peak value.
- and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
-
23. The voice activity detecting device according to claim 11 , wherein
said quality monitoring section determines a peak value of instantaneous values of the aural signal contained in each of the active voice frames; - and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
determines a standardized random variable as a ratio of the amplitude to the peak value.
- and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
-
26. The voice activity detecting device according to claim 2, wherein
said quality monitoring section integrates the monitored quality of the aural signal in sequence to apply the resultant as normal quality. -
29. The voice activity detecting device according to claim 2, wherein
said quality monitoring section integrates the monitored quality of the aural signal in sequence to apply as quality avalue which is obtained as a monotone increasing function or a monotone nondecreasing function of the resultant.
-
-
3. A voice activity detecting device comprising:
-
a speech-segment determining section for determining an accuracy that individual active voice frames belong to an active voice segment by performing companding processing for each of the active voice frames given in order of time sequence and by analyzing, based on a statistical characteristic of an aural signal, a sequence of instantaneous values of the aural signal obtained in the companding processing; and
a quality monitoring section for monitoring quality of the aural signal for each of the active voice frames, and wherein said speech-segment determining section applies a companding characteristic to the companding processing for each of the active voice frames, the companding characteristic being given as a monotone decreasing function of the quality monitored by said quality monitoring section. - View Dependent Claims (6, 9, 12, 15, 18, 21, 24, 27, 30)
said quality monitoring section determines a feature of the active voice segment of the aural signal and/or a feature of the non-active voice segment of the aural signal to obtain the quality of the aural signal as one of the features or a difference between the features. -
9. The voice activity detecting device according to claim 3, wherein
said quality monitoring section determines assessed noise-power for each of the active voice frames to obtain the quality of the aural signal as a monotone nonincreasing function of the assessed noise-power. -
12. The voice activity detecting device according to claim 3, wherein
said quality monitoring section determines, for each of the active voice frames, assessed noise-power and an assessed value of an SN ratio to obtain the quality of the aural signal as a monotone nonincreasing function and a monotone nondecreasing function, respectively. -
15. The voice activity detecting device according to claim 3, wherein
said quality monitoring section determines a standardized random variable for each of the active voice frames to obtain the quality of the aural signal as a monotone decreasing function of the standardized random variable. -
18. The voice activity detecting device according to claim 3, wherein
said quality monitoring section determines, for each of the active voice frames, a standardized random variable and an assessed value of an SN ratio to obtain the quality of the aural signal as a monotone nonincreasing function and a monotone nondecreasing function, respectively. -
21. The voice activity detecting device according to claim 9, wherein
said quality monitoring section determines a peak value of instantaneous values of the aural signal contained in each of the active voice frames; - and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
determines a standardized random variable as a ratio of the amplitude to the peak value.
- and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
-
24. The voice activity detecting device according to claim 12, wherein
said quality monitoring section determines a peak value of instantaneous values of the aural signal contained in each of the active voice frames; - and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
determines a standardized random variable as a ratio of the amplitude to the peak value.
- and calculates amplitude normalized by a standard deviation of the probability density function by applying, to a probability density function approximating to amplitude distribution of the aural signal, the number of the instantaneous values and a probability at which the peak value appears; and
-
27. The voice activity detecting device according to claim 3, wherein
said quality monitoring section integrates the monitored quality of the aural signal in sequence to apply the resultant as normal quality. -
30. The voice activity detecting device according to claim 3, wherein
said quality monitoring section integrates the monitored quality of the aural signal in sequence to apply as quality a value which is obtained as a monotone increasing function or a monotone nondecreasing function of the resultant.
-
-
31. A voice activity detecting method comprising the steps of:
-
determining, for each of active voice frames as an aural signal given in order of time sequence, a probability that the active voice frame belongs to an active voice segment, the determining being made based on a statistical characteristic of the aural signal;
monitoring quality of the aural signal for each of the active voice frames; and
determining, for each of the active voice frames as an aural signal given in order of time sequence, an accuracy that the active voice frame belongs to an active voice segment by weighting the determined probability with the monitored quality.
-
-
32. A voice activity detecting method comprising the steps of:
-
determining, for each of the active voice frames as an aural signal given in order of time sequence, an accuracy that the active voice frame belongs to an active voice segment, the determining being made based on a statistical characteristic of the aural signal;
monitoring quality of the aural signals for each of the active voice frames; and
weighting a sequence of instantaneous values of the aural signal contained in each of the active voice frames. by a weighting given as a monotone decreasing function or a monotone nonincreasing function of the monitored quality.
-
-
33. A voice activity detecting method comprising the steps of:
-
determining an accuracy that individual active voice frames belong to an active voice segment by performing companding processing for each of the active voice frames as an aural signal given in order of time sequence and by analyzing a sequence of instantaneous values of an aural signal obtained in the companding processing, the determining being made based on a statistical characteristic of the aural signal;
monitoring quality of the aural signal for each of the active voice frames; and
applying a companding characteristic to the companding processing for each of the active voice frames, the companding characteristic being given as a monotone decreasing function of the monitored quality.
-
Specification