Controlling loudness of speech in signals that contain speech and other types of audio material

US 7,454,331 B2
Filed: 08/30/2002
Issued: 11/18/2008
Est. Priority Date: 08/30/2002
Status: Expired

First Claim

Patent Images

1. A method for signal processing that comprises:

receiving an audio signal;

extracting features of the audio signal;

analyzing one or more of the extracted features to perform a speech determination;

classifying segments within an interval of the audio signal as speech segments or non-speech segments based upon the speech determination, wherein each segment has a respective loudness, and the loudness or the speech segments is less than the loudness of one or more loud non-speech segments;

analyzing one or more of the extracted features of the audio signal to obtain an estimated loudness of the speech segments; and

providing an indication of the loudness of the interval of the audio signal by calculating control information from a weighted combination of the estimated loudness of the speech segments and the loudness of the non-speech segments in which the estimated loudness of the speech segments is weighted more heavily.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Mechanisms are known that allow receivers to control loudness of speech in broadcast signals but these mechanisms require an estimate of speech loudness be inserted into the signal. Disclosed techniques provide improved estimates of loudness. According to one implementation, an indication of the loudness of an audio signal containing speech and other types of audio material is obtained by classifying segments of audio information as either speech or non-speech. The loudness of the speech segments is estimated and this estimate is used to derive the indication of loudness. The indication of loudness maybe used to control audio signal levels so that variations in loudness of speech between different programs is reduced. A preferred method for classifying speech segments is described.

185 Citations

35 Claims

1. A method for signal processing that comprises:
- receiving an audio signal;
  
  extracting features of the audio signal;
  
  analyzing one or more of the extracted features to perform a speech determination;
  
  classifying segments within an interval of the audio signal as speech segments or non-speech segments based upon the speech determination, wherein each segment has a respective loudness, and the loudness or the speech segments is less than the loudness of one or more loud non-speech segments;
  
  analyzing one or more of the extracted features of the audio signal to obtain an estimated loudness of the speech segments; and
  
  providing an indication of the loudness of the interval of the audio signal by calculating control information from a weighted combination of the estimated loudness of the speech segments and the loudness of the non-speech segments in which the estimated loudness of the speech segments is weighted more heavily.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
- - 2. The method according to claim 1 that comprises:
    - controlling the loudness of the interval of the audio signal in response to the control information so as to reduce variations in the loudness of the speech segments, wherein the loudness of the portions of the audio signal represented by the one or more loud non-speech segments is increased when the loudness of the portions of the audio signal represented by the speech-segments is increased.
  - 3. The method according to claim 1 that comprises:
    - assembling a representation of the audio signal and the control information into an output signal and transmitting the output signal.
  - 4. A computer-readable storage medium storing instructions for instructing processing circuitry to perform any one of the methods of claim 1 through 3.
  - 5. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs any one of the methods of claims 1 through 3.
  - 6. The method according to claim 1 or 2 that obtains the estimated loudness of the speech segments by calculating average power of a frequency-weighted version of the audio signal represented by the speech segments.
  - 7. The method according to claim 1 or 2 that obtains the estimated loudness of the speech segments by applying a psychoacoustic model of loudness to the audio information.
  - 8. The method according to claim 1 or 2 that classifies segments by deriving from the extracted features a plurality of characteristics of the audio signal, weighting each characteristic by a respective measure of importance, and classifying the segments according to a combination of the weighted characteristics.
  - 9. The method according to claim 1 or 2 that controls the loudness of the interval of the audio signal by adjusting the loudness only during intervals of the audio signal having a measure of audio energy less than a threshold.
  - 10. The method according to claim 1 or 2 whereinthe weighting of the loudness of the non-speech segments in the weighted combination is zero.
  - 11. The method according to claim 1 or 2 that comprisesanalyzing one or more of the extracted features of the audio signal to obtain an estimate of the loudness of one or more non-speech segments.
  - 12. The method according to claim 1 or 2 that comprises:
    - providing a speech measure that indicates a degree to which the audio signal represented by a respective segment has characteristics of speech; and
      
      providing the indication of loudness by calculating the control information in response to the estimated loudness of respective segments according to the speech measures of the respective segments.
  - 13. The method according to claim 1 or 2 that comprisescalculating the control information in response to the estimated loudness of respective segments according to time order of the segments.
  - 14. The method according to claim 1 or 2 that comprisesadapting lengths of the segments in response to characteristics of the audio signal.
  - 15. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 6.
  - 16. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 7.
  - 17. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 8.
  - 18. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 9.
  - 19. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 10.
  - 20. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 11.
  - 21. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 12.
  - 22. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 13.
  - 23. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 14.
  - 24. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 6.
  - 25. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 7.
  - 26. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 8.
  - 27. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 9.
  - 28. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 10.
  - 29. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 11.
  - 30. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 12.
  - 31. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 13.
  - 32. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 14.

33. A method for signal processing that comprises:
- receiving an input audio signal;
  
  extracting features of the input audio signal, the extracted features representing an interval of the input of audio signal;
  
  analyzing the extracted features to perform a speech determination;
  
  classifying the interval of the audio signal as speech or non-speech based upon the speech determination, wherein each interval has a respective loudness and the loudness of the interval classified as speech is less than the loudness of one or more other segments classified as non-speech;
  
  analyzing the extracted features of the interval classified as speech to obtain an estimated loudness of the interval classified as speech;
  
  calculating a loudness control parameter, the loudness control parameter being proportional to the difference between the estimated loudness of intervals classified as speech; and
  
  adjusting an estimated loudness of intervals classified as non-speech, the adjustment being proportional to the calculated loudness control parameter.
- View Dependent Claims (34, 35)
- - 34. A computer-readable storage medium storing instructions for instructing processing circuitry to perform the method of claim 33.
  - 35. An apparatus for signal processing that comprises:
    - an input terminal that receives an input signal;
      
      memory; and
      
      processing circuitry coupled to the input terminal and the memory;
      
      wherein the processing circuitry performs the method of claim 33.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cutanix Corporation, Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Gundry, Kenneth James, Riedmiller, Jeffrey Charles, Vinton, Mark Stuart, Robinson, Charles Quito, Venezia, Steven Joseph
Primary Examiner(s)
Opsasnick; Michael N.

Application Number

US10/233,073
Publication Number

US 20040044525A1
Time in Patent Office

2,272 Days
Field of Search

704/225, 704/201, 381/107, 381/96, 381/307, 381/18
US Class Current

704/225
CPC Class Codes

H03G 5/165 Equalizers; Volume or gain ...

Controlling loudness of speech in signals that contain speech and other types of audio material

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

185 Citations

35 Claims

Specification

Use Cases

Quick Links

Others

Controlling loudness of speech in signals that contain speech and other types of audio material

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

185 Citations

35 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others