User independent, real-time speech recognition system and method
First Claim
1. A sound recognition system for essentially real-time identification of, and in an essentially speaker independent manner, phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
- audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal;
analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; and
sound recognition means for identifying phoneme sound types contained within the audio speech signal, said sound recognition means comprising;
means for performing time domain analysis on a plurality of segmentized portions of the digitized audio signal so as to identify a plurality of time domain characteristics of the audio signal;
means for filtering each of the segmentized portions using a plurality of filter bands having predetermined high and low cutoff frequencies so as to identify thereby at least one frequency domain characteristic of each filtered segmentized portion; and
means for processing said time domain and frequency domain characteristics so as to identify therefrom the phonemes contained within the audio speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for identifying the phoneme sound types that are contained within an audio speech signal is disclosed. The system includes a microphone and associated conditioning circuitry, for receiving an audio speech signal and converting it to a representative electrical signal. The electrical signal is then sampled and converted to a digital audio signal with a digital-to-analog converter. The digital audio signal is input to a programmable digital sound processor, which digitally processes the sound so as to extract various time domain and frequency domain sound characteristics. These characteristics are input to a programmable host sound processor which compares the sound characteristics to standard sound data. Based on this comparison, the host sound processor identifies the specific phoneme sounds that are contained within the audio speech signal. The programmable host sound processor further includes linguistic processing program methods to convert the phoneme sounds into English words or other natural language words. These words are input to a host processor, which then utilizes the words as either data or commands.
-
Citations
36 Claims
-
1. A sound recognition system for essentially real-time identification of, and in an essentially speaker independent manner, phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
-
audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal; analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; and sound recognition means for identifying phoneme sound types contained within the audio speech signal, said sound recognition means comprising; means for performing time domain analysis on a plurality of segmentized portions of the digitized audio signal so as to identify a plurality of time domain characteristics of the audio signal; means for filtering each of the segmentized portions using a plurality of filter bands having predetermined high and low cutoff frequencies so as to identify thereby at least one frequency domain characteristic of each filtered segmentized portion; and means for processing said time domain and frequency domain characteristics so as to identify therefrom the phonemes contained within the audio speech signal. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A sound recognition system for identifying the phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
-
audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal; analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; filter means for providing a plurality of filter bands having predetermined high and low cutoff frequencies through which segmentized portions of the digitized audio signal are passed; and sound recognition means for programmably carrying out the following program steps; (a) performing a time domain analysis on the segmentized portions of the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal; (b) filtering the segmentized portions of the digitized audio signal through each of the plurality of filter bands; (c) measuring at least one frequency domain sound characteristic of each of said filtered segmentized portions; and (d) based on the at least one time domain characteristic and the at least one frequency domain characteristic, identifying at least one phoneme sound type contained within the audio speech signal. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A sound recognition system for identifying the phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
-
audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal; analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; filter means for providing a plurality of filter bands having predetermined high and low cutoff frequencies through which segmentized portions of the digitized audio signal are passed; digital sound processor means for (a) performing a time domain analysis on the segmentized portions of the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal, and for (b) measuring at least one frequency domain sound characteristic of each of the filtered segmentized portions; and host sound processor means for identifying at least one phoneme sound type contained within the audio speech signal based on the at least one time domain characteristic and the at least one frequency domain characteristic, and for translating said at least one phoneme sound type into at least one representative word of a preselected language. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A sound recognition system for identifying the phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
-
audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal; analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; filter means for providing a plurality of filter bands having predetermined high and low cutoff frequencies through which segmentized portions of the digitized audio signal are passed; and digital sound processor means for programmably carrying out the following program steps; (a) performing a time domain analysis on the segmentized portions of the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal; (b) successively filtering the segmentized portions of the digitized audio signal; (c) measuring at least one frequency domain sound characteristic from each of said filtered portions; and host sound processor means for programmably carrying out the following program steps; (a) based on the at least one time domain characteristic and the at least one frequency domain characteristic, identifying at least one phoneme sound type contained within the audio speech signal; and (b) translating said at least one phoneme sound type into at least one representative word of a preselected language. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
-
-
31. A method for identifying the phoneme sound types that are contained within an audio speech signal, the method comprising the steps of:
-
(a) receiving an audio speech signal; (b) converting the audio speech signal into a representative audio electrical signal; (c) digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal that is segmentized to form a plurality of separate time sliced signals; (d) performing a time domain analysis on the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal; (e) using a plurality of filter bands having predetermined cutoff frequencies to successively filter the time sliced signals of the digitized audio signal; (f) measuring at least one frequency domain sound characteristic from each of said filtered time sliced signals; and (g) based on the at least one time domain characteristic and the at least one frequency domain characteristic, identifying at least one phoneme sound type contained within the audio speech signal. - View Dependent Claims (32, 33, 34, 35)
-
-
36. A computer program product for use in a computerized sound recognition system that is adapted for receiving an audio speech signal and converting the audio speech signal into a representative audio electrical signal that is digitized, the computer program product comprising:
-
a computer readable medium for storing computer readable code means which, when executed by the computerized sound recognition system, will enable the system to identify phoneme sound types that are contained within the audio speech signal; and wherein the computer readable code means is comprised of computer readable instructions for causing the computerized sound recognition system to execute a method comprising the steps of; performing a time domain analysis on the digitized audio signal so as to identify a plurality of time sound characteristics of said audio speech signal; performing a frequency domain analysis on the digitized audio signal so as to identify a plurality of frequency domain sound characteristics of said audio speech signal; and based on the time domain characteristics and the frequency domain characteristics, identifying the phoneme sound types contained within the audio speech signal.
-
Specification