Multiple range dynamic level control

US 9,171,552 B1
Filed: 01/17/2013
Issued: 10/27/2015
Est. Priority Date: 01/17/2013
Status: Active Grant

First Claim

Patent Images

1. A computing device, comprising:

a processor;

one or more microphones configured to generate an input audio signal;

one or more speakers; and

memory, accessible by the processor and storing instructions that are executable by the processor to perform acts in multiple repetitions, the acts of each repetition comprising;

detecting voice presence in the input audio signal;

determining a voice level associated with the voice presence in the input audio signal;

comparing the voice level to at least one of a plurality of threshold amplitudes, each threshold amplitude of the plurality of threshold amplitudes corresponding to one of multiple level ranges;

identifying one of the multiple level ranges to which the voice level corresponds based at least in part on the comparing;

selecting an audio gain based at least in part on the identified one of the multiple level ranges;

smoothing the selected audio gain over time;

scaling the input audio signal by the selected and smoothed audio gain to produce an intermediate audio signal; and

attenuating the intermediate audio signal to reduce clipping, wherein the attenuating produces an output audio signal for output by the one or more speakers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio-based system may perform dynamic level adjustment by detecting voice activity in an input signal and evaluating voice levels during periods of voice activity. The current voice level is compared to a plurality of thresholds to determine a corresponding gain strategy, and the input signal is scaled in accordance with this gain strategy. Further adjustment to the signal is performed to reduce output clipping that might otherwise be produced.

34 Citations

View as Search Results

20 Claims

1. A computing device, comprising:
- a processor;
  
  one or more microphones configured to generate an input audio signal;
  
  one or more speakers; and
  
  memory, accessible by the processor and storing instructions that are executable by the processor to perform acts in multiple repetitions, the acts of each repetition comprising;
  
  detecting voice presence in the input audio signal;
  
  determining a voice level associated with the voice presence in the input audio signal;
  
  comparing the voice level to at least one of a plurality of threshold amplitudes, each threshold amplitude of the plurality of threshold amplitudes corresponding to one of multiple level ranges;
  
  identifying one of the multiple level ranges to which the voice level corresponds based at least in part on the comparing;
  
  selecting an audio gain based at least in part on the identified one of the multiple level ranges;
  
  smoothing the selected audio gain over time;
  
  scaling the input audio signal by the selected and smoothed audio gain to produce an intermediate audio signal; and
  
  attenuating the intermediate audio signal to reduce clipping, wherein the attenuating produces an output audio signal for output by the one or more speakers.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computing device of claim 1, wherein detecting the voice presence comprises performing noise activity detection (NAD) with respect to the input audio signal.
  - 3. The computing device of claim 1, wherein detecting the voice presence comprises estimating a signal envelope and a noise floor of the input audio signal.
  - 4. The computing device of claim 1, wherein:
    - the smoothing is performed by a first order low-pass filter having a first time constant that limits the rate of change of the selected and smoothed audio gain over time; and
      
      the attenuating is applied to peaks of the intermediate audio signal with a compressor having a second time constant that is shorter than the first time constant.
  - 5. The computing device of claim 1 wherein:
    - the input audio signal comprises a left input audio signal and a right input audio signal corresponding to left and right stereo channels, respectively; and
      
      determining the voice level comprises determining a maximum of;
      
      (i) a voice level of the left input audio signal, and (ii) a voice level of the right input audio signal.

6. A method of dynamically controlling an audio level, comprising:
- specifying a plurality of thresholds to define multiple level ranges and corresponding gain strategies;
  
  detecting voice presence in one or more audio signals, the one or more audio signals including the voice presence and other noise;
  
  determining a voice level associated with the voice presence in the one or more audio signals;
  
  comparing the voice level to the plurality of thresholds to identify one of the multiple level ranges to which the determined voice level corresponds; and
  
  selecting an audio gain based at least in part on the identified one of the multiple level ranges.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6, further comprising applying the selected audio gain to the one or more audio signals to create one or more output audio signals.
  - 8. The method of claim 6, further comprising smoothing the selected audio gain over time.
  - 9. The method of claim 6, further comprising:
    - applying the selected audio gain to the one or more audio signals to create one or more intermediate audio signals; and
      
      attenuating peaks of the one or more intermediate audio signals to reduce clipping.
  - 10. The method of claim 6, further comprising:
    - smoothing the selected audio gain over time using a first time constant;
      
      applying the selected and smoothed audio gain to produce one or more intermediate audio signals; and
      
      attenuating peaks of the one or more intermediate audio signals to reduce clipping, wherein the attenuating is performed using a second time constant that is shorter than the first time constant.
  - 11. The method of claim 6, wherein detecting the voice presence comprises performing noise activity detection (NAD) with respect to the one or more audio signals.
  - 12. The method of claim 6, wherein detecting the voice presence comprises estimating a signal envelope and a noise floor of the one or more audio signals.

13. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- detecting voice presence in one or more audio signals, the one or more audio signals including the voice presence and other noise;
  
  determining a voice level associated with the voice presence in the one or more audio signals;
  
  specifying a plurality of thresholds to define multiple level ranges and corresponding gain strategies;
  
  comparing the voice level to the plurality of thresholds to identify one of multiple level ranges to which the voice level corresponds;
  
  selecting an audio gain based at least in part on the identified one of the multiple level ranges; and
  
  applying the selected audio gain to the one or more audio signals.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The one or more non-transitory computer-readable media of claim 13, further comprising smoothing the selected audio gain over time.
  - 15. The one or more non-transitory computer-readable media of claim 13, wherein applying the selected audio gain produces one or more intermediate audio signals, the acts further comprising attenuating peaks of the one or more intermediate audio signals to reduce clipping.
  - 16. The one or more non-transitory computer-readable media of claim 13, wherein applying the selected audio gain produces one or more intermediate audio signals, the acts further comprising:
    - smoothing the selected audio gain over time using a first time constant; and
      
      attenuating peaks of the one or more intermediate audio signals to reduce clipping, wherein the attenuating is performed using a second time constant that is shorter than the first time constant.
  - 17. The one or more non-transitory computer-readable media of claim 13, wherein detecting the voice presence comprises performing noise activity detection (NAD) with respect to the one or more audio signals.
  - 18. The one or more non-transitory computer readable media of claim 13, wherein detecting the voice presence comprises estimating a signal envelope and a noise floor of the one or more audio signals.
  - 19. The one or more non-transitory computer-readable media of claim 13, wherein the one or more audio signals comprise left and right audio signals corresponding to left and right stereo channels, respectively.
  - 20. The one or more non-transitory computer-readable media of claim 13, wherein the other noise includes stationary noise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Yang, Jun
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/744,134
Time in Patent Office

1,013 Days
Field of Search

704/200, 704/200.1, 704/201, 704/225, 348/462, 348/736, 348/738, 381/104, 381/107
US Class Current

1/1
CPC Class Codes

G10L 21/0316 by changing the amplitude

G10L 25/84 for discriminating voice fr...

Multiple range dynamic level control

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple range dynamic level control

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links