System and method for filtering and eliminating noise from natural language utterances to improve speech recognition and parsing

US 8,140,327 B2
Filed: 04/22/2010
Issued: 03/20/2012
Est. Priority Date: 06/03/2002
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A method for filtering and eliminating noise from natural language utterances, comprising:

receiving a natural language utterance at a microphone array that adds one or more nulls to a beam pattern steered to point in a direction associated with a user speaking the natural language utterance, wherein the one or more nulls notch out point or limited area noise sources from an input speech signal corresponding to the natural language utterance;

comparing environmental noise to the input speech signal corresponding to the natural language utterance to set one or more parameters associated with an adaptive filter coupled to the microphone array;

passing the input speech signal corresponding to the natural language utterance to the adaptive filter, wherein the adaptive filter uses band shaping and notch filtering to remove narrow-band noise from the input speech signal corresponding to the natural language utterance according to the one or more parameters;

suppressing cross-talk and environmentally caused echoes in the input speech signal corresponding to the natural language utterance using adaptive echo cancellation in the adaptive filter;

sending the input speech signal passed through the adaptive filter to a speech coder that uses adaptive lossy audio compression to remove momentary gaps from the input speech signal and variable rate sampling to compress and digitize the input speech signal, wherein the speech coder optimizes the adaptive lossy audio compression and the variable rate sampling to only preserve components in the input speech signal that will be input to a speech recognition engine; and

transmitting the digitized input speech signal from a buffer in the speech coder to the speech recognition engine, wherein the speech coder transmits the digitized input speech signal to the speech recognition engine at a rate that depends on available bandwidth between the speech coder and the speech recognition engine.

View all claims

5 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

The systems and methods described herein may filter and eliminate noise from natural language utterances to improve accuracy associated with speech recognition and parsing capabilities. In particular, the systems and methods described herein may use a microphone array to provide directional signal capture, noise elimination, and cross-talk reduction associated with an input speech signal. Furthermore, a filter arranged between the microphone array and a speech coder may use band shaping, notch filtering, and adaptive echo cancellation to optimize a signal-to-noise ratio associated with the speech signal. The speech signal may then be sent to the speech coder, which may use adaptive lossy audio compression to optimize bandwidth requirements associated with transmitting the speech signal to a main unit that provides the speech recognition, parsing, and other natural language processing capabilities.

Citations

26 Claims

1. A method for filtering and eliminating noise from natural language utterances, comprising:
- receiving a natural language utterance at a microphone array that adds one or more nulls to a beam pattern steered to point in a direction associated with a user speaking the natural language utterance, wherein the one or more nulls notch out point or limited area noise sources from an input speech signal corresponding to the natural language utterance;
  
  comparing environmental noise to the input speech signal corresponding to the natural language utterance to set one or more parameters associated with an adaptive filter coupled to the microphone array;
  
  passing the input speech signal corresponding to the natural language utterance to the adaptive filter, wherein the adaptive filter uses band shaping and notch filtering to remove narrow-band noise from the input speech signal corresponding to the natural language utterance according to the one or more parameters;
  
  suppressing cross-talk and environmentally caused echoes in the input speech signal corresponding to the natural language utterance using adaptive echo cancellation in the adaptive filter;
  
  sending the input speech signal passed through the adaptive filter to a speech coder that uses adaptive lossy audio compression to remove momentary gaps from the input speech signal and variable rate sampling to compress and digitize the input speech signal, wherein the speech coder optimizes the adaptive lossy audio compression and the variable rate sampling to only preserve components in the input speech signal that will be input to a speech recognition engine; and
  
  transmitting the digitized input speech signal from a buffer in the speech coder to the speech recognition engine, wherein the speech coder transmits the digitized input speech signal to the speech recognition engine at a rate that depends on available bandwidth between the speech coder and the speech recognition engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method recited in claim 1, wherein the speech coder uses the adaptive lossy audio compression to remove the momentary gaps from the input speech signal to prevent errors in the speech recognition engine.
  - 3. The method recited in claim 1, wherein the speech coder uses the variable rate sampling to maximize a fidelity associated with the digitized input speech signal and minimize the available bandwidth used to transmit the digitized input speech signal to the speech recognition engine.
  - 4. The method recited in claim 3, wherein the speech coder adapts a rate associated with digitizing and compressing the input speech signal to minimize the available bandwidth used to transmit the digitized input speech signal to the speech recognition engine.
  - 5. The method recited in claim 1, wherein the adaptive filter has a pass band set to optimize a signal to noise ratio associated with the components in the input speech signal that will be input to the speech recognition engine.
  - 6. The method recited in claim 1, wherein a main unit that includes the speech recognition engine and a speech unit that includes the microphone array, the adaptive filter, and the speech coder are physically separate and communicate over a wireless link.
  - 7. The method recited in claim 1, wherein a wired link connects a main unit that includes the speech recognition engine to a speech unit that includes the microphone array, the adaptive filter, and the speech coder.
  - 8. The method recited in claim 1, further comprising adjusting the beam pattern to maximize gain associated with the input speech signal in the direction associated with the user speaking the natural language utterance.
  - 9. The method recited in claim 1, further comprising complementing the microphone array with a Voice over IP speech interface that allows the user to connect to and interact with a system that includes one or more of the microphone array, the adaptive filter, the speech coder, or the speech recognition engine from a remote location.
  - 10. The method recited in claim 1, wherein the microphone array includes multiple microphones arranged in a one-dimensional linear shape.
  - 11. The method recited in claim 10, further comprising adjusting the beam pattern to maximize gain associated with the input speech signal in the direction associated with the user speaking the natural language utterance.
  - 12. The method recited in claim 1, wherein the microphone array includes multiple microphones arranged in a two-dimensional circular, square, or triangular shape.
  - 13. The method recited in claim 1, wherein the microphone array includes multiple physically distributed microphones arranged to create a three-dimensional array.

14. A system for filtering and eliminating noise from natural language speech utterances, comprising:
- a microphone array configured to add one or more nulls to a beam pattern steered to point in a direction associated with a user speaking a natural language utterance to capture an input speech signal corresponding to the natural language utterance, wherein the one or more nulls notch out point or limited area noise sources from the input speech signal;
  
  an adaptive filter coupled to the microphone array, wherein the adaptive filter is configured to;
  
  receive the input speech signal corresponding to the natural language utterance from the microphone array and compare environmental, noise to the input speech signal to set one or more parameters associated with the adaptive filter;
  
  use band shaping and notch filtering to remove narrow-band noise from the input speech signal received from the microphone array according to the one or more parameters; and
  
  suppress cross-talk and environmentally caused echoes in the input speech signal received from the microphone array using adaptive echo cancellation;
  
  a speech coder arranged between the adaptive filter and a speech recognition engine, wherein the speech coder is configured to receive the input speech signal passed through the adaptive filter and use adaptive lossy audio compression to remove momentary gaps from the input speech signal and variable rate sampling to compress and digitize the input speech signal, wherein the speech coder optimizes the adaptive lossy audio compression and the variable rate sampling to only preserve components in the input speech signal that will be input to the speech recognition engine; and
  
  a transceiver configured to communicate the digitized input speech signal from a buffer in the speech coder to the speech recognition engine at a rate that depends on available bandwidth associated with a communication link that connects the transceiver and the speech recognition engine.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The system recited in claim 14, wherein the speech coder is configured to use the adaptive lossy audio compression to remove the momentary gaps from the input speech signal to prevent errors in the speech recognition engine.
  - 16. The system recited in claim 14, wherein the speech coder is configured to use the variable rate sampling to maximize a fidelity associated with the digitized input speech signal and minimize the available bandwidth that the transceiver uses to transmit the digitized input speech signal to the speech recognition engine.
  - 17. The system recited in claim 16, wherein the speech coder is further configured to adapt a rate associated with digitizing and compressing the input speech signal to minimize the available bandwidth that the transceiver uses to transmit the digitized input speech signal to the speech recognition engine.
  - 18. The system recited in claim 14, wherein the adaptive filter has a pass band set to optimize a signal to noise ratio associated with the components in the input speech signal that will be input to the speech recognition engine.
  - 19. The system recited in claim 14, further comprising:
    - a main unit that includes the speech recognition engine; and
      
      a speech unit that includes the microphone array, the adaptive filter, and the speech coder, wherein the main unit and the speech unit are physically separate and communicate wirelessly over the communication link.
  - 20. The system recited in claim 14, further comprising:
    - a main unit that includes the speech recognition engine; and
      
      a speech unit that includes the microphone array, the adaptive filter, and the speech coder, wherein the communication link includes one or more wires that connect the main unit and the speech unit.
  - 21. The system recited in claim 14, wherein the microphone array is further configured to adjust the beam pattern to maximize gain associated with the input speech signal in the direction associated with the user speaking the natural language utterance.
  - 22. The system recited in claim 14, further comprising a Voice over IP speech interface configured to complement the microphone array, and allow the user to connect to and interact with the system from a remote location.
  - 23. The system recited in claim 14, wherein the microphone array includes multiple microphones arranged in a one-dimensional linear shape.
  - 24. The system recited in claim 23, wherein the microphone array is further configured to adjust the beam pattern to maximize gain associated with the input speech signal in the direction associated with the user speaking the natural language utterance.
  - 25. The system recited in claim 14, wherein the microphone array includes multiple microphones arranged in a two-dimensional circular, square, or triangular shape.
  - 26. The system recited in claim 14, wherein the microphone array includes multiple physically distributed microphones arranged to create a three-dimensional array.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Dialect, LLC
Original Assignee
VoiceBox Technologies, Inc. (Microsoft Corporation)
Inventors
Kennewick, Robert A., Locke, David, Kennewick, Michael R. Sr., Kennewick, Michael R. Jr., Kennewick, Richard, Freeman, Tom
Primary Examiner(s)
Lerner, Martin

Application Number

US12/765,753
Publication Number

US 20100204986A1
Time in Patent Office

698 Days
Field of Search

704/205, 704/210, 704/215, 704/226, 704/257, 704/233, 381/71.1, 381/71.11, 381/71.14, 381/92, 379/406.05, 379/406.08
US Class Current

704/226
CPC Class Codes

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Y10S 707/99933   Query processing, i.e. sear...

System and method for filtering and eliminating noise from natural language utterances to improve speech recognition and parsing

First Claim

5 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for filtering and eliminating noise from natural language utterances to improve speech recognition and parsing

First Claim

5 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links