APPARATUSES AND METHODS FOR ENHANCED SPEECH RECOGNITION IN VARIABLE ENVIRONMENTS

US 20170110142A1
Filed: 10/18/2015
Published: 04/20/2017
Est. Priority Date: 10/18/2015
Status: Active Grant

First Claim

Patent Images

1. An integrated circuit device, comprising:

a background noise estimation module, the background noise estimation module to receive an input signal from a reference microphone, when voice activity is not detected the background noise estimation module to average the input signal from the reference microphone to form an estimated average background noise level;

at least two threshold values, each of the at least two threshold values to correspond to a different estimated average background noise level; and

selection logic, the selection logic to assign a particular estimated average background noise level to a threshold value from the at least two threshold values, wherein the threshold value is adapted to the particular estimated average background noise level, the threshold value is to be used by the desired voice activity detector (DVAD) to detect when desired voice activity is present.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, apparatuses, and methods are described to increase a signal-to-noise ratio difference between a main channel and reference channel. The increased signal-to-noise ratio difference is accomplished with an adaptive threshold for a desired voice activity detector (DVAD) and shaping filters. The DVAD includes averaging an output signal of a reference microphone channel to provide an estimated average background noise level. A threshold value is selected from a plurality of threshold values based on the estimated average background noise level. The threshold value is used to detect desired voice activity on a main microphone channel.

80 Citations

View as Search Results

30 Claims

1. An integrated circuit device, comprising:
- a background noise estimation module, the background noise estimation module to receive an input signal from a reference microphone, when voice activity is not detected the background noise estimation module to average the input signal from the reference microphone to form an estimated average background noise level;
  
  at least two threshold values, each of the at least two threshold values to correspond to a different estimated average background noise level; and
  
  selection logic, the selection logic to assign a particular estimated average background noise level to a threshold value from the at least two threshold values, wherein the threshold value is adapted to the particular estimated average background noise level, the threshold value is to be used by the desired voice activity detector (DVAD) to detect when desired voice activity is present.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The integrated circuit device of claim 1, wherein a normalized main signal is compared against a signal which includes the threshold value to detect the presence of desired voice activity.
  - 3. The integrated circuit device of claim 1, wherein a plurality of threshold values are associated with a range of estimated average background noise levels to provide a threshold value as a function of estimated average background noise level to the desired voice activity detector.
  - 4. The integrated circuit device of claim 1, wherein the input signal is to be filtered by a shaping filter, the shaping filter is selected to filter a noise component from the input signal thereby increasing a signal-to-noise ratio of the input signal before the input signal is averaged by the background noise estimation module.
  - 5. The integrated circuit device of claim 1, the background noise estimation module further comprising:
    - a buffer, the buffer is electrically coupled to receive the input signal;
      
      a signal compressor, the signal compressor is coupled to receive the input signal from the buffer and to scale a magnitude of the input signal; and
      
      a smoothing stage, the smoothing stage reduces high frequency content of the input signal.
  - 6. The integrated circuit device of claim 5, wherein the signal compressor applies a compression function selected from the group consisting of log base 10, log base 2, natural log (ln), square root, and a user defined compression function f(x).
  - 7. The integrated circuit device of claim 1, further comprising:
    - a second input signal from a second reference microphone, when voice activity is not detected, the background noise estimation module to use the second input signal and the input signal to form an estimated average background noise level.

8. An apparatus, comprising:
- an adaptive threshold module;
  
  the adaptive threshold module comprising;
  
  a background noise estimation module, the background noise estimation module to receive an input signal from a reference microphone, when voice activity is not detected the background noise estimation module to average the input signal from the reference microphone to form an estimated average background noise level;
  
  logic, the logic to assign an estimated background noise level to a threshold value;
  
  a first shaping filter, the first shaping filter to filter the reference signal to remove a noise component to provide a filtered reference signal with enhanced signal-to-noise ratio;
  
  a second shaping filter, the second shaping filter to filter a main signal from a main microphone, to remove the noise component to provide a filtered main signal with enhanced signal-to-noise ratio;
  
  a desired voice activity detector, the desired voice activity detector utilizes the filtered main signal, normalized by the filtered reference signal, and the threshold value to obtain a desired voice activity signal with enhanced signal-to-noise ratio difference; and
  
  a noise cancellation module, the noise cancellation module is electrically coupled to the desired voice activity detector, the desired voice activity signal is to be used by the noise cancellation module to identify desired speech during noise cancellation.
- View Dependent Claims (9, 10)
- - 9. The apparatus of claim 8, wherein the first shaping filter and the second shaping filters have programmable filter characteristics.
  - 10. The apparatus of claim 8, wherein the programmable filter characteristics are selected form the group consisting of a low pass filter, a band pass filter, a notch filter, a lower corner frequency, an upper corner frequency, a notch width, a roll-off slope and a user defined characteristic.

11. A method, comprising:
- averaging an output signal of a reference microphone channel to provide an estimated average background noise level;
  
  selecting a threshold value from a plurality of threshold values based on the estimated average background noise level; and
  
  using the threshold value to detect desired voice activity on a main microphone channel.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The method of claim 11, further comprising:
    - comparing a normalized main signal against a signal which includes the threshold value to detect the presence of desired voice activity.
  - 13. The method of claim 11, further comprising:
    - filtering the output signal with a shaping filter, the shaping filter is selected to filter a noise component from the output signal thereby increasing a signal-to-noise ratio of the output signal before the averaging.
  - 14. The method of claim 11, the averaging further comprising:
    - accepting the input signal for a period of time;
      
      compressing the input signal; and
      
      smoothing the input signal to reduce high frequency content.
  - 15. The method claim 14, wherein the compressing applies a compression function selected from the group consisting of log base 10, log base 2, natural log (ln), square root, and a user defined compression function f(x).
  - 16. The method of claim 11, wherein the averaging includes utilizing an output signal from a second reference microphone channel to provide the average background noise estimation level.
  - 17. The method of claim 14, wherein the period of time represents one or more frames of data.

18. An apparatus, comprising:
- a first signal path configured to receive a main microphone signal;
  
  a first shaping filter coupled to the first signal path, the first shaping filter to filter the main microphone signal, wherein the first shaping filter filters a noise component from the main microphone signal to increase a signal-to-noise ratio of the main microphone signal;
  
  a second signal path configured to receive a reference microphone signal;
  
  a second shaping filter coupled to the second signal path, the second shaping filter to filter the reference microphone signal, wherein the second shaping filter to increase a signal-to-noise ratio of the reference microphone signal and the second shaping filter to provide substantially the same filtering as the first shaping filter;
  
  a desired voice activity detector (DVAD), the DVAD is coupled to an output of the first shaping filter and an output of the second shaping filter, the DVAD to form a normalized main signal with increased signal-to-noise ratio, the normalized main signal is to be used during identification of desired voice activity.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The apparatus of claim 18, further comprising:
    - an adaptive threshold module, the second signal path is coupled to the adaptive threshold module, the adaptive threshold module further comprising;
      
      a background noise estimation module, the background noise estimation module receives an output of the second shaping filter and averages the output to obtain an estimated average background noise level; and
      
      selection logic, wherein the selection logic is configured to select a threshold value corresponding to the estimated average background noise level from at least two threshold values.
  - 20. The apparatus of claim 19, wherein the DVAD to utilize the threshold value to create a desired voice activity signal, and the apparatus further comprising:
    - a noise cancellation module, the noise cancellation module is controlled by the desired voice activity detection signal, wherein a greater degree of noise cancellation accuracy is achieved because of the increased signal-to-noise ratio provided by the shaping filters.
  - 21. The apparatus of claim 18, wherein filter characteristics of the first shaping filter and the second shaping filter are programmable.
  - 22. The apparatus of claim 21, wherein the programmable filter characteristics are selected form the group consisting of a low pass filter, a band pass filter, a notch filter, a lower corner frequency, an upper corner frequency, a notch width, a roll-off slope and a user defined characteristic.

23. A system, comprising:
- a data processing system, the data processing system is configured to process acoustic signals; and
  
  a computer readable medium containing executable computer program instructions, which when executed by the data processing system, cause the data processing system to perform a method comprising;
  
  averaging an output signal of a reference microphone channel to provide an estimated average background noise level;
  
  selecting a threshold value from a plurality of threshold values based on the estimated average background noise level; and
  
  using the threshold value to detect desired voice activity on a main microphone channel.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. The system of claim 23, the method performed by the data processing system, further comprising:
    - comparing a normalized main signal against a signal which includes the threshold value to detect a presence of desired voice activity.
  - 25. The system of claim 23, the method performed by the data processing system, further comprising:
    - filtering the output signal with a shaping filter, the shaping filter is selected to filter a noise component from the output signal thereby increasing a signal-to-noise ratio of the output signal before the averaging.
  - 26. The system of claim 23, wherein in the method performed by the data processing system, further comprising:
    - accepting the input signal for a period of time;
      
      compressing the input signal; and
      
      smoothing the input signal to reduce high frequency content.
  - 27. The system claim 26, wherein the compressing applies a compression function selected from the group consisting of log base 10, log base 2, natural log (ln), square root, and a user defined compression function f(x).
  - 28. The system of claim 23, wherein the averaging includes utilizing an output signal from a second reference microphone channel to provide the average background noise estimation level.
  - 29. The system of claim 26, wherein the period of time represents one or more frames of data.
  - 30. The system of claim 23, wherein the averaging utilizes an output signal from a main microphone channel to provide the average background noise estimation level instead of the output signal from the reference microphone channel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Solos Technology Limited
Original Assignee
Kopin Corporation
Inventors
Fan, Dashen, Chen, Xi, Bao, Hua

Granted Patent

US 11,631,421 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 2021/02165   Two microphones, one receiv...

G10L 2025/786   Adaptive threshold

G10L 21/0216   characterised by the method...

G10L 25/84   for discriminating voice fr...

APPARATUSES AND METHODS FOR ENHANCED SPEECH RECOGNITION IN VARIABLE ENVIRONMENTS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

80 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

APPARATUSES AND METHODS FOR ENHANCED SPEECH RECOGNITION IN VARIABLE ENVIRONMENTS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

80 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links