Automated sound segment selection method and system

US 8,494,844 B2
Filed: 11/19/2009
Issued: 07/23/2013
Est. Priority Date: 11/19/2008
Status: Active Grant

First Claim

Patent Images

1. A method for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, comprising the steps of:

(a) receiving the sound sample comprising a time series of digital samples;

(b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;

(c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;

(d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and

(e) storing the selected segment, or a pointer to the selected segment, in a memory;

wherein the quality measure is the product of functions w₁, w₂, w₃, w₄, and w₅, where;

(a) w₁is a monotonically increasing, non-negative real-valued function computed on input value r_N, where r_Nis the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;

(b) w₂is a monotonically decreasing, non-negative real-valued function computed on input value CV_SPL, where CV_SPLis the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;

(c) w₃is a monotonically decreasing, non-negative real-valued function computed on input value r_c, where r_cis the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;

(d) w₄is a monotonically increasing, non-negative real-valued function computed on input value r_vwhere r_vis the fraction of the segment that is voiced; and

(e) w₅is a monotonically decreasing, non-negative real-valued function computed on input value CV_F0, where CV_F0is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computerized method and system is provided for automatically selecting from a digitized sound sample a segment of the sample that is optimal for the purpose of measuring clinical metrics for voice and speech assessment. A quality measure based on quality parameters of segments of the sound sample is applied to candidate segments to identify the highest quality segment within the sound sample. The invention can optionally provide feedback to the speaker to help the speaker increase the quality of the sound sample provided. The invention also can optionally perform sound pressure level calibration and noise calibration. The invention may optionally compute clinical metrics on the selected segment and may further include a normative database method or system for storing and analyzing clinical measurements.

24 Citations

28 Claims

1. A method for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, comprising the steps of:
- (a) receiving the sound sample comprising a time series of digital samples;
  
  (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;
  
  (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;
  
  (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and
  
  (e) storing the selected segment, or a pointer to the selected segment, in a memory;
  
  wherein the quality measure is the product of functions w₁, w₂, w₃, w₄, and w₅, where;
  
  (a) w₁is a monotonically increasing, non-negative real-valued function computed on input value r_N, where r_Nis the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;
  
  (b) w₂is a monotonically decreasing, non-negative real-valued function computed on input value CV_SPL, where CV_SPLis the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;
  
  (c) w₃is a monotonically decreasing, non-negative real-valued function computed on input value r_c, where r_cis the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;
  
  (d) w₄is a monotonically increasing, non-negative real-valued function computed on input value r_vwhere r_vis the fraction of the segment that is voiced; and
  
  (e) w₅is a monotonically decreasing, non-negative real-valued function computed on input value CV_F0, where CV_F0is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the function w₁is defined to be
  - 3. The method of claim 2 wherein k₂is 5, k₃is 30, k₄is 5, and k₅is 20.
  - 4. The method of claim 3 wherein the sound pressure level is calibrated to produce an absolute sound pressure level in decibels and ρ
    - , x₁and x₂are chosen so that 0.3<
      
      ρ
      
      <
      
      0.6, 20 dB<
      
      x₁<
      
      34 dB and 32 dB<
      
      x₂<
      
      46 dB.
  - 5. The method of claim 4 wherein ρ
    - is 0.4, x₁is 30 dB and x₂is 42 dB.
  - 6. The method of claim 1 wherein the sound pressure level is calibrated to produce an absolute sound pressure level in decibels.
  - 7. The method of claim 6 wherein the value of r_Nis calculated in real-time as the subject presents a sound to the analog-to-digital converter, based on the last N digital samples, where N is a pre-defined number, and an indication of the value of r_Nis presented to the subject.
  - 8. The method of claim 7 wherein a graphical indication of the value of r_Nis presented to the user along with an indication of the sufficiency of the value.
  - 9. The method of claim 1 wherein the method selects a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples selected to be used to compute a particular clinical metric.
  - 10. The method of claim 9 wherein a plurality of quality measures are used, each quality measure being defined to select a segment suitable for the computation of a particular clinical metric.
  - 11. The method of claim 1 wherein the method computes clinical metrics on the selected segment.
  - 12. The method of claim 11 further comprising a normative database method for storing and analyzing clinical measurements.

13. A non-transitory computer-readable memory having recorded thereon statements and instructions for execution by a processor for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, said statements and instructions when executed by a processor cause the processor to perform the steps of:
- (a) receiving the sound sample comprising a time series of digital samples;
  
  (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;
  
  (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;
  
  (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and
  
  (e) storing the selected segment, or a pointer to the selected segment, in a memory;
  
  wherein the quality measure is the product of functions w₁, w₂, w₃, w₄, and w₅, where;
  
  (a) w₁is a monotonically increasing, non-negative real-valued function computed on input value r_N, where r_Nis the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;
  
  (b) w₂is a monotonically decreasing, non-negative real-valued function computed on input value CV_SPL, where CV_SPLis the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;
  
  (c) w₃is a monotonically decreasing, non-negative real-valued function computed on input value r_c, where r_cis the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;
  
  (d) w₄is a monotonically increasing, non-negative real-valued function computed on input value r_v, where r_vis the fraction of the segment that is voiced; and
  
  (e) w₅is a monotonically decreasing, non-negative real-valued function computed on input value CV_F0, where CV_F0is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.

14. A method comprising transmitting over a communications medium instructions for execution by a processor for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, said instructions when executed by a processor cause the processor to perform the steps of:
- (a) receiving the sound sample comprising a time series of digital samples;
  
  (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;
  
  (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;
  
  (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and
  
  (e) storing the selected segment, or a pointer to the selected segment, in a memory;
  
  wherein the quality measure is the product of functions w₁, w₂, w₃, w₄, and w₅, where;
  
  (a) w₁is a monotonically increasing, non-negative real-valued function computed on input value r_N, where r_Nis the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;
  
  (b) w₂is a monotonically decreasing, non-negative real-valued function computed on input value CV_SPL, where CV_SPLis the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;
  
  (c) w₃is a monotonically decreasing, non-negative real-valued function computed on input value r_c, where r_cis the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;
  
  (d) w₄is a monotonically increasing, non-negative real-valued function computed on input value r_v, where r_vis the fraction of the segment that is voiced; and
  
  (e) w₅is a monotonically decreasing, non-negative real-valued function computed on input value CV_F0, where CV_F0is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.

15. A non-transitory computer-readable memory storing instructions for execution by a processor for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter, said instructions when executed by a processor cause the processor to perform the steps of:
- (a) receiving the sound sample comprising a time series of digital samples;
  
  (b) determining a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples;
  
  (c) for each segment, calculating the value of a quality measure, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics;
  
  (d) selecting the segment that has the greatest value of the quality measure, the quality measure being greater for segments that are more suitable for use in computing clinical metrics; and
  
  (e) storing the selected segment, or a pointer to the selected segment, in a memory;
  
  wherein the quality measure is the product of functions w₁, w₂, w₃, w₄, and w₅, where;
  
  (a) w₁is a monotonically increasing, non-negative real-valued function computed on input value r_N, where r_Nis the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;
  
  (b) w₂is a monotonically decreasing, non-negative real-valued function computed on input value CV_SPL, where CV_SPLis the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;
  
  (c) w₃is a monotonically decreasing, non-negative real-valued function computed on input value r_cwhere r_cis the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;
  
  (d) w₄is a monotonically increasing, non-negative real-valued function computed on input value r_v, where r_vis the fraction of the segment that is voiced; and
  
  (e) w₅is a monotonically decreasing, non-negative real-valued function computed on input value CV_F0, where CV_F0is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.

16. A voice segment selection system for selecting a segment of a sound sample of a subject'"'"'s voice that has been digitized by an analog to digital converter into a time series of digital samples, the system comprising:
- (a) a processor and computer readable memory;
  
  (b) a segmentation module for determining, using the processor, a plurality of segments from the sound sample, each segment comprising a pre-defined number of consecutive digital samples;
  
  (c) a quality module for calculating the value of a quality measure for each segment, the quality measure being a real-valued function of at least one parameter of the segment, the parameter reflecting the suitability of the segment for use in computing clinical metrics; and
  
  (d) a selection module for selecting the segment that has the greatest value of the quality measure and storing the selected segment, or a pointer to the selected segment, in a memory, the quality measure being greater for segments that are more suitable for use in computing clinical metrics;
  
  wherein the quality measure is the product of functions w₁, w₂, w₃, w₄, and w₅, where;
  
  (a) w₁is a monotonically increasing, non-negative real-valued function computed on input value r_N, where r_Nis the ratio of the sound pressure level of the segment to the estimated background noise, measured in decibels;
  
  (b) w₂is a monotonically decreasing, non-negative real-valued function computed on input value CV_SPL, where CV_SPLis the ratio of the standard deviation of the sound pressure level for the segment to the mean of the sound pressure level for the segment;
  
  (c) w₃is a monotonically decreasing, non-negative real-valued function computed on input value r_c, where r_cis the fraction of digital samples in the segment that are clipped by the analog-to-digital converter;
  
  (d) w₄is a monotonically increasing, non-negative real-valued function computed on input value r_v, where r_vis the fraction of the segment that is voiced; and
  
  (e) w_sis a monotonically decreasing, non-negative real-valued function computed on input value CV_F0, where CV_F0is the ratio of the standard deviation of the fundamental frequency for the segment to the mean of the fundamental frequency for the segment.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 17. The system of claim 16 wherein the function w₁is defined to be
  - 18. The system of claim 17 wherein k₂is 5, k₃is 30, k₄is 5, and k₅is 20.
  - 19. The system of claim 18 wherein the sound pressure level is calibrated to produce an absolute sound pressure level in decibels and ρ
    - , x₁and x₂are chosen so that 0.3<
      
      ρ
      
      <
      
      0.6, 20 dB<
      
      x₁<
      
      34 dB and 32 dB<
      
      x₂<
      
      46 dB.
  - 20. The system of claim 19 wherein ρ
    - is 0.4, x₁is 30 dB and x₂is 42 dB.
  - 21. The system of claim 16 wherein the sound pressure level is calibrated to produce an absolute sound pressure level in decibels.
  - 22. The system of claim 21 wherein the value of r_Nis calculated in real-time as the subject presents a sound to the analog-to-digital converter, based on the last N digital samples, where N is a pre-defined number, and an indication of the value of r_Nis presented to the subject.
  - 23. The system of claim 22 wherein a graphical indication of the value of r_Nis presented to the user along with an indication of the sufficiency of the value.
  - 24. The system of claim 16 wherein the system selects a plurality of segments, each segment comprising a pre-defined number of consecutive digital samples selected to be used to compute a particular clinical metric.
  - 25. The system of claim 24 wherein a plurality of quality measures are used, each quality measure being defined to select a segment suitable for the computation of a particular clinical metric.
  - 26. The system of claim 16 wherein the system computes clinical metrics on the selected segment.
  - 27. The system of claim 26 wherein the system further comprises a normative database subsystem for storing and analyzing clinical measurements using the stored selected segment.
  - 28. The system of claim 16 wherein the system acts as a server that receives from a client a sound sample and a request to select a segment, wherein the system returns to the client an indication of the selected segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
David N. Fernandes
Original Assignee
Human Centered Technologies Incorporated
Inventors
Fernandes, David N.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Serrou, Abdelali

Application Number

US12/592,125
Publication Number

US 20100153101A1
Time in Patent Office

1,342 Days
Field of Search

704207-209, 704/220, 704/231, 704/251
US Class Current

704/220
CPC Class Codes

G10L 17/26 Recognition of special voic...

G10L 25/69 for evaluating synthetic or...

Automated sound segment selection method and system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Automated sound segment selection method and system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links