Speech recognition apparatus for consumer electronic applications
First Claim
1. A method for recognizing an utterance spoken by a user comprising the steps of:
- capturing the utterance as an input audio signal;
converting the input audio signal to a digitized representation;
using a single pole digital difference filter repeatedly to obtain a plurality of filtered waveforms from the digitized representation, wherein said single pole digital difference filter is in the form;
space="preserve" listing-type="equation">Y(n)=AY(n-1)+BX(n)+CX(n-1);
extracting estimates of a plurality of acoustic parameters from the plurality of filtered waveforms at successive sampling points;
determining an end time of the spoken utterance and a duration of the spoken utterance;
thereaftertime-normalizing said estimates so that the spoken utterance extends over a predetermined number of time intervals; and
analyzing the time-normalized estimates to identify the utterance.
3 Assignments
0 Petitions
Accused Products
Abstract
A spoken word or phrase recognition device. The device does not require a digital signal processor, large RAM, or extensive analog circuitry. The input audio signal is digitized and passed recursively through a digital difference filter to produce a multiplicity of filtered output waveforms. These waveforms are processed in real time by a microprocessor to generate a pattern that is recognized by a neural network pattern classifier that operates in software in the microprocessor. By application of additional techniques, this device has been shown to recognize an unknown speaker saying a digit from zero through nine with an accuracy greater than 99%. Because of the recognition accuracy and cost-effective design, the device may be used in cost sensitive applications such as toys, electronic learning aids, and consumer electronic products.
-
Citations
21 Claims
-
1. A method for recognizing an utterance spoken by a user comprising the steps of:
-
capturing the utterance as an input audio signal; converting the input audio signal to a digitized representation; using a single pole digital difference filter repeatedly to obtain a plurality of filtered waveforms from the digitized representation, wherein said single pole digital difference filter is in the form;
space="preserve" listing-type="equation">Y(n)=AY(n-1)+BX(n)+CX(n-1);extracting estimates of a plurality of acoustic parameters from the plurality of filtered waveforms at successive sampling points; determining an end time of the spoken utterance and a duration of the spoken utterance;
thereaftertime-normalizing said estimates so that the spoken utterance extends over a predetermined number of time intervals; and analyzing the time-normalized estimates to identify the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18)
-
-
15. Apparatus for recognizing an utterance spoken by a user, said apparatus comprising;
-
an analog-to-digital converter that converts an audio signal into a digitized representation; a repeatedly accessed single pole digital difference filter that obtains a plurality of filtered waveforms from the digitized representation, wherein said single pole digital difference filter is in the form;
space="preserve" listing-type="equation">Y(n)=AY(n-1)+BX(n)+CX(n-1);a feature extractor that extracts estimates of a plurality of acoustic parameters from the plurality of filtered waveforms at successive sampling points; a timer that determines an end time of the spoken utterance and a duration of the spoken utterance; a time-normalizer that time-normalizes said estimates so that the spoken utterance extends over a predetermined number of time intervals; and a classifier that analyzes the time-normalized estimates to identify the spoken utterance. - View Dependent Claims (16, 17, 21)
-
-
19. A method for recognizing an utterance spoken by a user comprising the steps of:
-
capturing the utterance as an input audio signal; converting the input audio signal to a digitized representation; using a digital difference filter to obtain a plurality of filtered waveforms from the digitized representation, wherein said single pole digital difference filter is in the form;
space="preserve" listing-type="equation">Y(n)=AY(n-1)+BX(n)+CX(n-1);extracting, concurrently with said capturing step, estimates of a plurality of acoustic parameters from the plurality of filtered waveforms at successive sampling points, concurrently with said capturing step; determining an end time of the spoken utterance and a duration of the spoken utterance;
thereaftertime-normalizing said estimates so that the spoken utterance extends over a predetermined number of time intervals; and analyzing the time-normalized estimates to identify the utterance.
-
-
20. Apparatus for recognizing an utterance spoken by a user, said apparatus comprising;
-
an audio input device that accepts speech input from the user including the utterance and provides an electrical signal responsive to the speech input; an analog-to-digital converter that converts the electrical signal into a digitized representation; a single pole digital difference filter that obtains a plurality of filtered waveforms from the digitized representation, wherein said single pole digital difference filter is in the form;
space="preserve" listing-type="equation">Y(n)=AY(n-1)+BX(n)+CX(n-1);a feature extractor that extracts, estimates of a plurality of acoustic parameters from the plurality of filtered waveforms at successive sampling points; a timer that determines an end time of the spoken utterance and a duration of the spoken utterance; a time-normalizer that time-normalizes said estimates so that the spoken utterance extends over a predetermined number of time intervals; and a classifier that analyzes the time-normalized estimates to identify the spoken utterance.
-
Specification