Automated sound segment selection method and system
First Claim
1. A method for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, comprising the steps of:
- (a) receiving the sound sample comprising a time series of digital samples;
(b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;
(c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;
(d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and
(e) storing the selected segment, or a pointer to the selected segment, in a memory;
wherein the quality measure is the product of functions w1, w2, w3, w4, and w5, where;
(a) w1 is a monotonically increasing, non-negative real-valued function computed on input value rN, where rN is the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;
(b) w2 is a monotonically decreasing, non-negative real-valued function computed on input value CVSPL, where CVSPL is the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;
(c) w3 is a monotonically decreasing, non-negative real-valued function computed on input value rc, where rc is the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;
(d) w4 is a monotonically increasing, non-negative real-valued function computed on input value rv where rv is the fraction of the segment that is voiced; and
(e) w5 is a monotonically decreasing, non-negative real-valued function computed on input value CVF0, where CVF0 is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.
3 Assignments
0 Petitions
Accused Products
Abstract
A computerized method and system is provided for automatically selecting from a digitized sound sample a segment of the sample that is optimal for the purpose of measuring clinical metrics for voice and speech assessment. A quality measure based on quality parameters of segments of the sound sample is applied to candidate segments to identify the highest quality segment within the sound sample. The invention can optionally provide feedback to the speaker to help the speaker increase the quality of the sound sample provided. The invention also can optionally perform sound pressure level calibration and noise calibration. The invention may optionally compute clinical metrics on the selected segment and may further include a normative database method or system for storing and analyzing clinical measurements.
24 Citations
28 Claims
-
1. A method for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, comprising the steps of:
-
(a) receiving the sound sample comprising a time series of digital samples; (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples; (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics; (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and (e) storing the selected segment, or a pointer to the selected segment, in a memory;
wherein the quality measure is the product of functions w1, w2, w3, w4, and w5, where;(a) w1 is a monotonically increasing, non-negative real-valued function computed on input value rN, where rN is the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels; (b) w2 is a monotonically decreasing, non-negative real-valued function computed on input value CVSPL, where CVSPL is the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment; (c) w3 is a monotonically decreasing, non-negative real-valued function computed on input value rc, where rc is the fraction of digital samples in the segment that are clipped by the analog-to-digital converter; (d) w4 is a monotonically increasing, non-negative real-valued function computed on input value rv where rv is the fraction of the segment that is voiced; and (e) w5 is a monotonically decreasing, non-negative real-valued function computed on input value CVF0, where CVF0 is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable memory having recorded thereon statements and instructions for execution by a processor for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, said statements and instructions when executed by a processor cause the processor to perform the steps of:
-
(a) receiving the sound sample comprising a time series of digital samples; (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples; (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics; (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and (e) storing the selected segment, or a pointer to the selected segment, in a memory; wherein the quality measure is the product of functions w1, w2, w3, w4, and w5, where; (a) w1 is a monotonically increasing, non-negative real-valued function computed on input value rN, where rN is the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels; (b) w2 is a monotonically decreasing, non-negative real-valued function computed on input value CVSPL, where CVSPL is the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment; (c) w3 is a monotonically decreasing, non-negative real-valued function computed on input value rc, where rc is the fraction of digital samples in the segment that are clipped by the analog-to-digital converter; (d) w4 is a monotonically increasing, non-negative real-valued function computed on input value rv, where rv is the fraction of the segment that is voiced; and (e) w5 is a monotonically decreasing, non-negative real-valued function computed on input value CVF0, where CVF0 is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.
-
-
14. A method comprising transmitting over a communications medium instructions for execution by a processor for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, said instructions when executed by a processor cause the processor to perform the steps of:
-
(a) receiving the sound sample comprising a time series of digital samples; (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;
(c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;
(d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and
(e) storing the selected segment, or a pointer to the selected segment, in a memory;wherein the quality measure is the product of functions w1, w2, w3, w4, and w5, where; (a) w1 is a monotonically increasing, non-negative real-valued function computed on input value rN, where rN is the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels; (b) w2 is a monotonically decreasing, non-negative real-valued function computed on input value CVSPL, where CVSPL is the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment; (c) w3 is a monotonically decreasing, non-negative real-valued function computed on input value rc, where rc is the fraction of digital samples in the segment that are clipped by the analog-to-digital converter; (d) w4 is a monotonically increasing, non-negative real-valued function computed on input value rv, where rv is the fraction of the segment that is voiced; and (e) w5 is a monotonically decreasing, non-negative real-valued function computed on input value CVF0, where CVF0 is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.
-
-
15. A non-transitory computer-readable memory storing instructions for execution by a processor for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, said instructions when executed by a processor cause the processor to perform the steps of:
-
(a) receiving the sound sample comprising a time series of digital samples; (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples; (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics; (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and (e) storing the selected segment, or a pointer to the selected segment, in a memory; wherein the quality measure is the product of functions w1, w2, w3, w4, and w5, where; (a) w1 is a monotonically increasing, non-negative real-valued function computed on input value rN, where rN is the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels; (b) w2 is a monotonically decreasing, non-negative real-valued function computed on input value CVSPL, where CVSPL is the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment; (c) w3 is a monotonically decreasing, non-negative real-valued function computed on input value rc where rc is the fraction of digital samples in the segment that are clipped by the analog-to-digital converter; (d) w4 is a monotonically increasing, non-negative real-valued function computed on input value rv, where rv is the fraction of the segment that is voiced; and (e) w5 is a monotonically decreasing, non-negative real-valued function computed on input value CVF0, where CVF0 is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.
-
-
16. A voice segment selection system for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter into a time series of digital samples, the system comprising:
-
(a) a processor and computer readable memory; (b) a segmentation module for determining, using the processor, a plurality of segments from the sound sample, each segment comprising a pre-defined number of consecutive digital samples; (c) a quality module for calculating the value of a quality measure for each segment, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics; and (d) a selection module for selecting the segment that has the greatest value of the quality measure and storing the selected segment, or a pointer to the selected segment, in a memory, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; wherein the quality measure is the product of functions w1, w2, w3, w4, and w5, where; (a) w1 is a monotonically increasing, non-negative real-valued function computed on input value rN, where rN is the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels; (b) w2 is a monotonically decreasing, non-negative real-valued function computed on input value CVSPL, where CVSPL is the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment; (c) w3 is a monotonically decreasing, non-negative real-valued function computed on input value rc, where rc is the fraction of digital samples in the segment that are clipped by the analog-to-digital converter; (d) w4 is a monotonically increasing, non-negative real-valued function computed on input value rv, where rv is the fraction of the segment that is voiced; and (e) ws is a monotonically decreasing, non-negative real-valued function computed on input value CVF0, where CVF0 is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification