User independent, real-time speech recognition system and method

US 5,640,490 A
Filed: 11/14/1994
Issued: 06/17/1997
Est. Priority Date: 11/14/1994
Status: Expired due to Fees

First Claim

Patent Images

1. A sound recognition system for essentially real-time identification of, and in an essentially speaker independent manner, phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:

audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal;

analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; and

sound recognition means for identifying phoneme sound types contained within the audio speech signal, said sound recognition means comprising;

means for performing time domain analysis on a plurality of segmentized portions of the digitized audio signal so as to identify a plurality of time domain characteristics of the audio signal;

means for filtering each of the segmentized portions using a plurality of filter bands having predetermined high and low cutoff frequencies so as to identify thereby at least one frequency domain characteristic of each filtered segmentized portion; and

means for processing said time domain and frequency domain characteristics so as to identify therefrom the phonemes contained within the audio speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for identifying the phoneme sound types that are contained within an audio speech signal is disclosed. The system includes a microphone and associated conditioning circuitry, for receiving an audio speech signal and converting it to a representative electrical signal. The electrical signal is then sampled and converted to a digital audio signal with a digital-to-analog converter. The digital audio signal is input to a programmable digital sound processor, which digitally processes the sound so as to extract various time domain and frequency domain sound characteristics. These characteristics are input to a programmable host sound processor which compares the sound characteristics to standard sound data. Based on this comparison, the host sound processor identifies the specific phoneme sounds that are contained within the audio speech signal. The programmable host sound processor further includes linguistic processing program methods to convert the phoneme sounds into English words or other natural language words. These words are input to a host processor, which then utilizes the words as either data or commands.

Citations

36 Claims

1. A sound recognition system for essentially real-time identification of, and in an essentially speaker independent manner, phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
- audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal;
  
  analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal; and
  
  sound recognition means for identifying phoneme sound types contained within the audio speech signal, said sound recognition means comprising;
  
  means for performing time domain analysis on a plurality of segmentized portions of the digitized audio signal so as to identify a plurality of time domain characteristics of the audio signal;
  
  means for filtering each of the segmentized portions using a plurality of filter bands having predetermined high and low cutoff frequencies so as to identify thereby at least one frequency domain characteristic of each filtered segmentized portion; and
  
  means for processing said time domain and frequency domain characteristics so as to identify therefrom the phonemes contained within the audio speech signal.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A sound recognition system as defined in claim 1 wherein the audio processor means comprises:
    - means for inputting the audio speech signal and for converting it to an audio electrical signal; and
      
      means for conditioning the audio electrical signal so that it is in a representative electrical form that is suitable for digital sampling.
  - 3. A sound recognition system as defined in claim 2 wherein the conditioning means comprises:
    - signal amplification means for amplifying the audio electrical signal to a predetermined level;
      
      means for limiting the level of the amplified audio electrical signal to a predetermined output level; and
      
      filter means, connected to the limiting means, for limiting the audio electrical signal to a predetermined maximum frequency of interest and thereby providing the representative audio electrical signal.
  - 4. A sound recognition system as defined in claim 1, further comprising electronic means for receiving at least one word in a preselected language corresponding to the at least one phoneme sound type contained within the audio speech signal, and for programmably processing the at least one word as either a data input or as a command input.
  - 5. A sound recognition system as defined in claim 1, wherein the time domain characteristic includes at least one of the following:
    - an average amplitude of the audio speech signal;
      
      an absolute difference average of the audio speech signal; and
      
      a zero crossing rate of the audio speech signal.
  - 6. A sound recognition system as defined in claim 1, wherein the at least one frequency domain characteristic includes at least one of the following:
    - a frequency of at least one of said filtered segmentized portions; and
      
      an amplitude of at least one of said filtered segmentized portions.

7. A sound recognition system for identifying the phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
- audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal;
  
  analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal;
  
  filter means for providing a plurality of filter bands having predetermined high and low cutoff frequencies through which segmentized portions of the digitized audio signal are passed; and
  
  sound recognition means for programmably carrying out the following program steps;
  
  (a) performing a time domain analysis on the segmentized portions of the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal;
  
  (b) filtering the segmentized portions of the digitized audio signal through each of the plurality of filter bands;
  
  (c) measuring at least one frequency domain sound characteristic of each of said filtered segmentized portions; and
  
  (d) based on the at least one time domain characteristic and the at least one frequency domain characteristic, identifying at least one phoneme sound type contained within the audio speech signal.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. A sound recognition system as defined in claim 7 wherein the audio processor means comprises:
    - means for inputting the audio speech signal and for converting it to an audio electrical signal; and
      
      means for conditioning the audio electrical signal so that it is in a representative electrical form that is suitable for digital sampling.
  - 9. A sound recognition system as defined in claim 8 wherein the conditioning means comprises:
    - signal amplification means for amplifying the audio electrical signal to a predetermined level;
      
      means for limiting the level of the amplified audio electrical signal to a predetermined output level; and
      
      filter means, connected to the limiting means, for limiting the audio electrical signal to a predetermined maximum frequency of interest and thereby providing the representative audio electrical signal.
  - 10. A sound recognition system as defined in claim 9, wherein the at least one time domain characteristic includes at least one of the following:
    - an average amplitude of the audio speech signal;
      
      an absolute difference average of the audio speech signal; and
      
      a zero crossing rate of the audio speech signal.
  - 11. A sound recognition system as defined in claim 10, wherein the at least one frequency domain characteristic includes at least one of the following:
    - a frequency of at least one of said filtered segmentized portions; and
      
      an amplitude of at least one of said filtered segmentized portions.
  - 12. A sound recognition system as defined in claim 11, wherein the at least one phoneme sound type contained within the audio speech signal is identified by comparing the at least one measured frequency domain characteristic to a plurality of sound standards each having an associated phoneme sound type and at least one corresponding standard frequency domain characteristic, wherein the at least one identified sound type is the sound standard type having a standard frequency domain characteristic that matches the measured frequency domain characteristic most closely.
  - 13. A sound recognition system as defined in claim 12, wherein the at least one measured frequency domain characteristic, and the plurality of standard frequency domain characteristics are expressed in terms of a chromatic scale.
  - 14. A sound recognition system as defined in claim 13, further comprising electronic means for receiving at least one word in a preselected language corresponding to the at least one phoneme sound type contained within the audio speech signal, and for programmably processing the at least one word as either a data input or as a command input.

15. A sound recognition system for identifying the phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
- audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal;
  
  analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal;
  
  filter means for providing a plurality of filter bands having predetermined high and low cutoff frequencies through which segmentized portions of the digitized audio signal are passed;
  
  digital sound processor means for (a) performing a time domain analysis on the segmentized portions of the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal, and for (b) measuring at least one frequency domain sound characteristic of each of the filtered segmentized portions; and
  
  host sound processor means for identifying at least one phoneme sound type contained within the audio speech signal based on the at least one time domain characteristic and the at least one frequency domain characteristic, and for translating said at least one phoneme sound type into at least one representative word of a preselected language.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. A sound recognition system as defined in claim 15 wherein the audio processor means comprises:
    - means for inputting the audio speech signal and for converting it to an audio electrical signal; and
      
      means for conditioning the audio electrical signal so that it is in a representative electrical form that is suitable for digital sampling.
  - 17. A sound recognition system as defined in claim 16 wherein the conditioning means comprises:
    - signal amplification means for amplifying the audio electrical signal to a predetermined level;
      
      means for limiting the level of the amplified audio electrical signal to a predetermined output level; and
      
      filter means, connected to the limiting means, for limiting the audio electrical signal to a predetermined maximum frequency of interest and thereby providing the representative audio electrical signal.
  - 18. A sound recognition system as defined in claim 15, further comprising electronic means for receiving at least one word in a preselected language corresponding to the at least one phoneme sound type contained within the audio speech signal, and for programmably processing the at least one word as either a data input or as a command input.
  - 19. A sound recognition system as defined in claim 15, wherein the said at least one time domain characteristic includes at least one of the following:
    - a average amplitude of the audio speech signal;
      
      a absolute difference average of the audio speech signal; and
      
      a zero crossing rate of the audio speech signal.
  - 20. A sound recognition system as defined in claim 15, wherein the at least one frequency domain characteristic includes at least one of the following:
    - a frequency of at least one of said filtered segmentized portions; and
      
      an amplitude of at least one of said filtered segmentized portions.
  - 21. A sound recognition system as defined in claim 15, wherein the digital sound processor means comprises:
    - first programmable means for programmably executing a predetermined series of program steps;
      
      program memory means for storing the predetermined series of program steps utilized by said first programmable means; and
      
      data memory means for providing a digital storage area for use by said first programmable means.
  - 22. A sound recognition system as defined in claim 15, wherein the host sound processor means comprises:
    - second programmable means for programmably executing a predetermined series of program steps;
      
      program memory means for storing the predetermined series of program steps utilized by said second programmable means; and
      
      data memory means for providing a digital storage area for use by said first programmable means.

23. A sound recognition system for identifying the phoneme sound types that are contained within an audio speech signal, the sound recognition system comprising:
- audio processor means for receiving an audio speech signal and for converting the audio speech signal into a representative audio electrical signal;
  
  analog-to-digital converter means for digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal;
  
  filter means for providing a plurality of filter bands having predetermined high and low cutoff frequencies through which segmentized portions of the digitized audio signal are passed; and
  
  digital sound processor means for programmably carrying out the following program steps;
  
  (a) performing a time domain analysis on the segmentized portions of the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal;
  
  (b) successively filtering the segmentized portions of the digitized audio signal;
  
  (c) measuring at least one frequency domain sound characteristic from each of said filtered portions; and
  
  host sound processor means for programmably carrying out the following program steps;
  
  (a) based on the at least one time domain characteristic and the at least one frequency domain characteristic, identifying at least one phoneme sound type contained within the audio speech signal; and
  
  (b) translating said at least one phoneme sound type into at least one representative word of a preselected language.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
- - 24. A sound recognition system as defined in claim 23 wherein the audio processor means comprises:
    - means for inputting the audio speech signal and for converting it to an audio electrical signal; and
      
      means for conditioning the audio electrical signal so that it is in a representative electrical form that is suitable for digital sampling.
  - 25. A sound recognition system as defined in claim 24 wherein the conditioning means comprises:
    - signal amplification means for amplifying the audio electrical signal to a predetermined level;
      
      means for limiting the level of the amplified audio electrical signal to a predetermined output level; and
      
      filter means, connected to the limiting means, for limiting the audio electrical signal to a predetermined maximum frequency of interest and thereby providing the representative audio electrical signal.
  - 26. A sound recognition system as deemed in claim 25, wherein the at least one time domain characteristic includes at least one of the following:
    - an average amplitude of the audio speech signal;
      
      an absolute difference average of the audio speech signal; and
      
      a zero crossing rate of the audio speech signal.
  - 27. A sound recognition system as defined in claim 26, wherein the said at least one frequency domain characteristic includes at least one of the following:
    - a frequency of at least one of said filtered portions; and
      
      an amplitude of at least one of said filtered portions.
  - 28. A sound recognition system as defined in claim 27, wherein the at least one phoneme sound type contained within the audio speech signal is identified by comparing the at least one measured frequency domain characteristic to a plurality of sound standards each having an associated phoneme sound type and at least one corresponding standard frequency domain characteristic, wherein the at least one identified sound type is the sound standard type having a standard frequency domain characteristic that matches the measured frequency domain characteristic most closely.
  - 29. A sound recognition system as defined in claim 28, wherein the at least one measured frequency domain characteristic, and the plurality of standard frequency domain characteristics are expressed in terms of a chromatic scale.
  - 30. A sound recognition system as defined in claim 29, further comprising electronic means for receiving the at least one representative word, and for programmably processing the at least one word as either a data input or as a command input.

31. A method for identifying the phoneme sound types that are contained within an audio speech signal, the method comprising the steps of:
- (a) receiving an audio speech signal;
  
  (b) converting the audio speech signal into a representative audio electrical signal;
  
  (c) digitizing the audio electrical signal at a predetermined sampling rate so as to produce a digitized audio signal that is segmentized to form a plurality of separate time sliced signals;
  
  (d) performing a time domain analysis on the digitized audio signal so as to identify at least one time domain sound characteristic of said audio speech signal;
  
  (e) using a plurality of filter bands having predetermined cutoff frequencies to successively filter the time sliced signals of the digitized audio signal;
  
  (f) measuring at least one frequency domain sound characteristic from each of said filtered time sliced signals; and
  
  (g) based on the at least one time domain characteristic and the at least one frequency domain characteristic, identifying at least one phoneme sound type contained within the audio speech signal.
- View Dependent Claims (32, 33, 34, 35)
- - 32. A sound recognition system as defined in claim 31, wherein the said at least one time domain characteristic includes at least one of the following:
    - an average amplitude of the audio speech signal;
      
      an absolute difference average of the audio speech signal; and
      
      a zero crossing rate of the audio speech signal.
  - 33. A sound recognition system as defined in claim 31, wherein the said at least one frequency domain characteristic includes at least one of the following:
    - a frequency of at least one of said filtered time sliced signals; and
      
      an amplitude of at least one of said filtered time sliced signals.
  - 34. A sound recognition system as defined in claim 31, wherein the at least one phoneme sound type contained within the audio speech signal is identified by comparing the at least one measured frequency domain characteristic to a plurality of sound standards each having an associated phoneme sound type and at least one corresponding standard frequency domain characteristic, wherein the at least one identified sound type is the sound standard type having a standard frequency domain characteristic that matches the measured frequency domain characteristic most closely.
  - 35. A sound recognition system as defined in claim 34, wherein the at least one measured frequency domain characteristic, and the plurality of standard frequency domain characteristics are expressed in terms of a chromatic scale.

36. A computer program product for use in a computerized sound recognition system that is adapted for receiving an audio speech signal and converting the audio speech signal into a representative audio electrical signal that is digitized, the computer program product comprising:
- a computer readable medium for storing computer readable code means which, when executed by the computerized sound recognition system, will enable the system to identify phoneme sound types that are contained within the audio speech signal; and
  
  wherein the computer readable code means is comprised of computer readable instructions for causing the computerized sound recognition system to execute a method comprising the steps of;
  
  performing a time domain analysis on the digitized audio signal so as to identify a plurality of time sound characteristics of said audio speech signal;
  
  performing a frequency domain analysis on the digitized audio signal so as to identify a plurality of frequency domain sound characteristics of said audio speech signal; and
  
  based on the time domain characteristics and the frequency domain characteristics, identifying the phoneme sound types contained within the audio speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fonix Corp.
Original Assignee
Fonix Corp.
Inventors
Moncur, Robert Brian, Hansen, C. Hal, Shepherd, Dale Lynn
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Chawan, Vijay B.

Application Number

US08/339,902
Time in Patent Office

946 Days
Field of Search

381/41-46, 395/2.63
US Class Current

704/254
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/10   using distance or distortio...

G10L 25/09   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/93   Discriminating between voic...

User independent, real-time speech recognition system and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

User independent, real-time speech recognition system and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links