Speech recognition using microphone antenna array
First Claim
1. A method of recognizing speech, comprising:
- receiving speech input via a plurality of microphones;
processing a corresponding plurality of audio signals from said microphones with a beamforming network to generate a first improved audio signal;
detecting voice activity on said first improved audio signal;
processing said plurality of audio signals from said microphones with an adaptive noise cancellation filter having variable filter coefficients to generate a second improved audio signal;
repeatedly updating said variable filter coefficients during periods of voice inactivity as detected on said first improved audio signal to minimize non-speech audio components in said second improved audio signal; and
performing speech recognition on said second improved audio signal.
8 Assignments
0 Petitions
Accused Products
Abstract
A system and method of audio processing provides enhanced speech recognition. Audio input is received at a plurality of microphones. The multi-channel audio signal from the microphones may be processed by a beamforming network to generate a single-channel enhanced audio signal, on which voice activity is detected. Audio signals from the microphones are additionally processed by an adaptable noise cancellation filter having variable filter coefficients to generate a noise-suppressed audio signal. The variable filter coefficients are updated during periods of voice inactivity. A speech recognition engine may apply a speech recognition algorithm to the noise-suppressed audio signal and generate an appropriate output. The operation of the speech recognition engine and the adaptable noise cancellation filter may advantageously be controlled based on voice activity detected in the single-channel enhanced audio signal from the beamforming network.
122 Citations
11 Claims
-
1. A method of recognizing speech, comprising:
-
receiving speech input via a plurality of microphones;
processing a corresponding plurality of audio signals from said microphones with a beamforming network to generate a first improved audio signal;
detecting voice activity on said first improved audio signal;
processing said plurality of audio signals from said microphones with an adaptive noise cancellation filter having variable filter coefficients to generate a second improved audio signal;
repeatedly updating said variable filter coefficients during periods of voice inactivity as detected on said first improved audio signal to minimize non-speech audio components in said second improved audio signal; and
performing speech recognition on said second improved audio signal. - View Dependent Claims (2, 3)
-
-
4. A method of speech recognition, comprising:
-
receiving audio input at a microphone array comprising a plurality of microphones and generating a multi-channel audio signal;
processing said multi-channel audio signal into a first single channel audio signal by a beamforming network;
detecting voice activity in said first single channel audio signal;
processing said multi-channel audio signal into a second single channel audio signal by an adaptable audio filter;
selectively updating parameters associated with said adaptable audio filter;
timing said updating of parameters based on said voice activity detection; and
performing speech recognition on said second single channel audio signal. - View Dependent Claims (5)
-
-
6. A system for speech recognition, comprising:
-
a microphone array comprising a plurality of microphones for receiving audio input, said microphone array generating a multi-channel audio signal;
a beamforming network receiving said multi-channel audio signal and generating a first single channel audio signal;
a voice activity detection module receiving said first single channel audio signal and generating a binary signal indicative of the presence or absence of a voice component in said first single channel audio signal;
an adaptive noise cancellation filter having variable filter coefficients receiving said multi-channel audio signal and said binary signal, and generating a second single channel audio signal, said filter coefficients being updated in response to said binary signal;
a speech recognition engine receiving said second single channel audio signal, operative to interpret speech in said second single channel audio signal in response to said binary signal. - View Dependent Claims (7, 8, 9, 10, 11)
-
Specification