Adapting masking thresholds for encoding a low frequency transient signal in audio data

US 7,627,481 B1
Filed: 04/19/2005
Issued: 12/01/2009
Est. Priority Date: 04/19/2005
Status: Active Grant

First Claim

Patent Images

1. A volatile or non-volatile machine-readable storage medium storing one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:

in response to determining that a first window of audio data does not contain a low frequency transient signal,computing a first group of masking thresholds for a first long block that corresponds to the first window of audio data; and

based on said first group of masking thresholds, encoding said first long block of audio data;

in response to identifying a low frequency transient signal in a second window of audio data,computing a second group of masking thresholds for short blocks corresponding to the second window of audio data;

selecting one or more particular masking thresholds, from the second group of masking thresholds, for use in encoding a second long block of audio data that corresponds to the second window of audio data; and

encoding, based on the one or more particular masking thresholds, the second long block of audio data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved audio coding technique encodes audio having a low frequency transient signal, using a long block, but with a set of adapted masking thresholds. Upon identifying an audio window that contains a low frequency transient signal, masking thresholds for the long block may be calculated as usual. A set of masking thresholds calculated for the 8 short blocks corresponding to the long block are calculated. The masking thresholds for low frequency critical bands are adapted based on the thresholds calculated for the short blocks, and the resulting adapted masking thresholds are used to encode the long block of audio data. The result is encoded audio with rich harmonic content and negligible coder noise resulting from the low frequency transient signal.

279 Citations

22 Claims

1. A volatile or non-volatile machine-readable storage medium storing one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
- in response to determining that a first window of audio data does not contain a low frequency transient signal,computing a first group of masking thresholds for a first long block that corresponds to the first window of audio data; and
  
  based on said first group of masking thresholds, encoding said first long block of audio data;
  
  in response to identifying a low frequency transient signal in a second window of audio data,computing a second group of masking thresholds for short blocks corresponding to the second window of audio data;
  
  selecting one or more particular masking thresholds, from the second group of masking thresholds, for use in encoding a second long block of audio data that corresponds to the second window of audio data; and
  
  encoding, based on the one or more particular masking thresholds, the second long block of audio data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The volatile or non-volatile machine-readable storage medium of claim 1, wherein the one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
    - computing a third group of masking thresholds for the second long block that corresponds to the second window of audio data;
      
      encoding the second long block of audio data using a quantization step that is based on a masking threshold between the one or more particular masking thresholds and a masking threshold from the third group of masking thresholds.
  - 3. The volatile or non-volatile machine-readable storage medium of claim 1, wherein the one or more particular masking thresholds correspond to one or more low frequency critical bands of the second long block of audio data.
  - 4. The machine-readable storage medium of claim 1, wherein the one or more particular masking thresholds correspond to a particular short block of the short blocks, and wherein each critical band associated with the particular short block corresponds to a particular masking threshold, and wherein the one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
    - mapping a critical band associated with the second long block to one or more particular critical bands associated with the particular short block;
      
      wherein selecting the one or more particular masking thresholds for use in encoding the second long block includes selecting one or more particular masking thresholds that correspond to the one or more particular critical bands, which map to the critical band associated with the second long block, that are associated with the particular short block; and
      
      encoding, based on the one or more particular masking thresholds that correspond to the one or more particular critical bands associated with the particular short block, the particular critical band associated with the second long block.
  - 5. The volatile or non-volatile machine-readable storage medium of claim 1, wherein the one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
    - wherein selecting the one or more particular masking thresholds for use in encoding the second long block includes selecting one or more minimum masking thresholds associated with the second long block, from the group of masking thresholds, for use in encoding the second long block of audio data.
  - 6. The volatile or non-volatile machine-readable storage medium of claim 1, wherein the one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
    - identifying the low frequency transient signal in the window of audio data.
  - 7. The volatile or non-volatile machine-readable storage medium of claim 6, wherein a low frequency transient signal is a signal having a frequency that is substantially at or below a threshold frequency value, wherein the threshold frequency value is within a range from 4 kHz to 6 kHz.
  - 8. The volatile or non-volatile machine-readable storage medium of claim 6, wherein the one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform identifying the low frequency transient signal by performing:
    - passing the audio data through a low pass filter;
      
      grouping the audio data that passes through the low pass filter into contiguous groups of samples;
      
      determining the maximum amplitude within each group of samples;
      
      comparing the maximum amplitude within a group of samples to a decayed maximum amplitude value within an adjacent previous group of samples; and
      
      if the ratio of the maximum amplitude within the group of samples and the decayed maximum amplitude value within the adjacent previous group of samples exceeds a particular threshold value, then determining that the audio data contains a low frequency transient signal.
  - 9. The volatile or non-volatile machine-readable storage medium of claim 1, wherein the one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
    - encoding, based on the one or more particular masking thresholds and in compliance with MPEG-4 Advanced Audio Coding standard specifications, the second long block of audio data.
  - 10. The volatile or non-volatile machine-readable storage medium of claim 1, wherein the group of masking thresholds comprises respective masking thresholds for each critical band of each of the short blocks corresponding to the window of audio data.

11. A computer-implemented method for determining a masking threshold for use in encoding audio data, the method comprising:
- in response to determining that a first window of audio data does not contain a low frequency transient signal,computing a first group of masking thresholds for a first long block that corresponds to the first window of audio data; and
  
  based on said first group of masking thresholds, encoding said first long block of audio data;
  
  in response to identifying a low frequency transient signal in a second window of audio data,computing a second group of masking thresholds for short blocks corresponding to the second window of audio data;
  
  selecting one or more particular masking thresholds, from the second group of masking thresholds, for use in encoding a second long block of audio data that corresponds to the second window of audio data;
  
  encoding, based on the one or more particular masking thresholds, the second long block of audio data;
  
  wherein the computer-implemented method is performed by one or more computing devices.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The computer-implemented method of claim 11, further comprising:
    - computing a third group of masking thresholds for the second long block that corresponds to the second window of audio data;
      
      encoding the second long block of audio data using a quantization step that is based on a masking threshold between the one or more particular masking thresholds and a masking threshold from the third group of masking thresholds.
  - 13. The computer-implemented method of claim 11, wherein the one or more particular masking thresholds correspond to one or more low frequency critical bands of the second long block of audio data.
  - 14. The computer-implemented method of claim 11, wherein the one or more particular masking thresholds correspond to a particular short block of the short blocks, and wherein each critical band associated with the particular short block corresponds to a particular masking threshold, the method further comprising:
    - mapping a critical band associated with the second long block to one or more particular critical bands associated with the particular short block;
      
      wherein selecting the one or more particular masking thresholds for use in encoding the second long block includes selecting one or more particular masking thresholds that correspond to the one or more particular critical bands, which map to the critical band associated with the second long block, that are associated with the particular short block; and
      
      encoding, based on the one or more particular masking thresholds that correspond to the one or more particular critical bands associated with the particular short block, the particular critical band associated with the second long block.
  - 15. The computer-implemented method of claim 11:
    - wherein selecting the one or more particular masking thresholds for use in encoding the second long block includes selecting one or more minimum masking thresholds associated with the second long block, from the group of masking thresholds, for use in encoding the second long block of audio data.
  - 16. The computer-implemented method of claim 11, further comprising:
    - identifying the low frequency transient signal in the window of audio data.
  - 17. The computer-implemented method of claim 16, wherein a low frequency transient signal is a signal having a frequency that is substantially at or below a threshold frequency value, wherein the threshold frequency value is within a range from 4 kHz to 6 kHz.
  - 18. The computer-implemented method of claim 16, wherein identifying the low frequency transient signal comprises:
    - passing the audio data through a low pass filter;
      
      grouping the audio data that passes through the low pass filter into contiguous groups of samples;
      
      determining the maximum amplitude within each group of samples;
      
      comparing the maximum amplitude within a group of samples to a decayed maximum amplitude value within an adjacent previous group of samples; and
      
      if the ratio of the maximum amplitude within the group of samples and the decayed maximum amplitude value within the adjacent previous group of samples exceeds a particular threshold value, then determining that the audio data contains a low frequency transient signal.
  - 19. The computer-implemented method of claim 11, further comprising:
    - encoding, based on the one or more particular masking thresholds and in compliance with MPEG-4 Advanced Audio Coding standard specifications, the second long block of audio data.
  - 20. The computer-implemented method of claim 11, wherein the group of masking thresholds comprises respective masking thresholds for each critical band of each of the short blocks corresponding to the window of audio data.

21. A volatile or non-volatile machine-readable storage medium storing one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
- in response to determining that a first window of audio data does not contain a low frequency transient signal,computing a first group of masking thresholds for a first long block that corresponds to the first window of audio data; and
  
  based on said first group of masking thresholds, encoding said first long block of audio data;
  
  in response to identifying a low frequency transient signal in a second window of digital audio samples,computing a second group of masking thresholds for a second long block that corresponds to the second window of audio samples;
  
  computing a third group of masking thresholds for short blocks corresponding to the second window of audio samples;
  
  selecting a final masking threshold that is between (a) one or more particular masking thresholds from the third group of masking thresholds and (b) one or more particular masking thresholds from the second group of masking thresholds; and
  
  based on said final masking threshold, encoding by a coder the second long block that corresponds to the window of audio samples.

22. A computer-implemented method comprising:
- in response to determining that a first window of audio data does not contain a low frequency transient signal,computing a first group of masking thresholds for a first long block that corresponds to the first window of audio data; and
  
  based on said first group of masking thresholds, encoding said first long block of audio data;
  
  in response to identifying a low frequency transient signal in a second window of digital audio samples,computing a second group of masking thresholds for a second long block that corresponds to the second window of audio samples;
  
  computing a third group of masking thresholds for short blocks corresponding to the second window of audio samples;
  
  selecting a final masking threshold that is between (a) one or more particular masking thresholds from the third group of masking thresholds and (b) one or more particular masking thresholds from the second group of masking thresholds; and
  
  based on said final masking threshold, encoding by a coder the second long block that corresponds to the window of audio samples;
  
  wherein the computer-implemented method is performed by one or more computing devices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Kuo, Shyh-Shiaw, Baumgarte, Frank
Primary Examiner(s)
Wozniak; James S

Application Number

US11/110,331
Time in Patent Office

1,687 Days
Field of Search

704500-501
US Class Current

704/500
CPC Class Codes

G10L 19/025 Detection of transients or ...

Adapting masking thresholds for encoding a low frequency transient signal in audio data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

279 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Adapting masking thresholds for encoding a low frequency transient signal in audio data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

279 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others