Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition

US 20030061050A1
Filed: 11/27/2002
Published: 03/27/2003
Est. Priority Date: 07/06/1999
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds or sound components excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user'"'"'s similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.

Citations

59 Claims

1. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds or sound components excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The speech recognition system of claim 1 wherein said artificially excited sounds permit inaudible speaking or command-giving to a computer, computer-coupled device or computer-containing device.
  - 3. The speech recognition system of claim 1 wherein said sounds are one of continuous speech, command-style speech, or an utterance.
  - 4. The speech recognition system of claim 1 adapted for processing sounds that are both naturally excited and artificially excited, said sounds, or signal representations thereof, being substantially processed as one of separate or separated signals or signal-components or as a combined signal.
  - 5. The speech recognition system of claim 4 wherein said artificially excited sounds permit improved recognition-accuracy or improved recognition-speed of natural speech, sounds or utterances.
  - 6. The speech recognition system of claim 4 wherein said artificially excited and naturally excited speech sounds emanating from said tract temporally over-lap at least part of the time
  - 7. The speech recognition system of claim 4 wherein said artificially excited and naturally excited speech sounds emanating from said tract are not identical in spectral content at least part of the time
  - 8. The speech recognition system of claim 4 wherein said artificially excited signal, before or after tract modification or modulation, includes at least one of the following aspects:
    - (a) said artificially excited signal contains a harmonic or sub-harmonic of a natural formant, (b) said artificially excited signal contains phase information which is utilized in the recognizer, (c) said artificially excited signal is broadband in nature, (d) said artificially excited signal is selected or set as a function of any natural signal parameter, (e) said artificially excited signal contains tones or frequency components which interact with each other as a function of a vocal tract parameter, (f) said artificially excited signal contains at least one tone or frequency component which is modulated or modified by any portion of the vocal tract anatomy, (g) said artificially excited signal is generally inaudible to the unaided ear of a separate listener, of (h) said artificially excited signal is swept in frequency.
  - 9. The speech recognition system of claim 1 wherein said vocal tract includes at least one element selected from the group consisting of vocal chords, larynx, laryngeal valve, the glottal opening, the glottis, the arytenoids, the pharynx, the esophagus, the tongue, the pharyngeal walls, the velum, the hard palate, the alveolar ridge, the lips, teeth, gums, cheeks or any nasal cavity, at least said one element modifying or modulating said artificial excitation as the speaker articulates speech either audibly or inaudibly.
  - 10. The speech recognition system of claim 1 further including a training data means capable of supporting training using at least the artificially excited speech signals.
  - 11. The speech recognition system of claim 1 further including means for directing at least a first modified or modulated artificially-excited speech signal to a first speech representation means which samples at least said first signal to produce a first sequence of speech representation vectors, representative at least in part, of said artificially excited signal.
  - 12. The speech recognition system of claim 11 further including means for modeling or classifying said first sequence of vectors.
  - 13. The speech recognition system of claim 12 further including means for subjecting said modeled or classified vectors to a search in a search module, said search module having access to at least one of an acoustic model, a lexical model, or a language model.
  - 14. The speech recognition system of claim 13 wherein two search modules operate, one arranged to process naturally excited signals and the other to process artificially excited signals, said system utilizing the results of both modules to decide what speech took place or what words were articulated.
  - 15. The speech recognition system of claim 11 wherein both the artificially excited signal and the naturally excited signal are represented by a single set of representation vectors.
  - 16. The speech recognition system of claim 11 further including means for directing at least a naturally excited second modified or modulated signal to a speech representation means which samples said naturally excited signal to produce a second sequence of speech representation vectors, representative at least in part of said natural speech signal.
  - 17. The speech recognition system of claim 16 further including second means for modeling or classifying said second sequence of vectors representative, at least in part, of said naturally excited speech signal.
  - 18. The speech recognition system of claim 17 further including second means for subjecting said modeled or classified natural speech vectors to a search in a second search module said search module having access to at least one of an acoustic model, a lexical model or a language model.
  - 19. The speech recognition system of claim 1 wherein training means are provided for both naturally excited signals and artificially excited signals, said means being one of independent or the same means, said signals being one of separate or combined.
  - 20. The speech recognition system of claim 1 wherein artificial excitations are adapted to an individual user.
  - 21. The speech recognition system of claim 20 wherein said adapted excitations are portable across at least one of multiple recognition systems, computers, networks, and speech-conversant devices.
  - 22. The speech recognition system of claim 1 further including a separator, deconvolution, or subtraction means to discern naturally excited sounds or sound components from artificially excited sounds or sound components.

23. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom, said speech recognition system including:
- means for representation, modeling or classification, and searching of artificially excited speech signals or signal components;
  
  means for representation, modeling or classification, and searching of naturally excited speech signals or signal components;
  
  at least one of said searching means having access to at least one of an acoustic model, lexical model or language model; and
  
  at least one training means.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 24. The speech recognition system of claim 23 wherein said artificially excited sounds permit inaudible speaking or command-giving to a computer, computer-coupled device, or computer-containing device.
  - 25. The speech recognition system of claim 23 adapted for processing sounds that are both naturally excited and artificially excited, said sounds, or signal representations thereof, being substantially processed as one of separate or separated signals or signal-components or as a combined signal.
  - 26. The speech recognition system of claim 25 wherein said artificially excited sounds permit improved recognition-accuracy or improved recognition-speed of natural speech, sounds or utterances.
  - 27. The speech recognition system of claim 25 wherein said artificially excited and naturally excited speech sounds emanating from said tract temporally overlap at least part of the time
  - 28. The speech recognition system of claim 25 wherein said artificially excited and naturally excited speech sounds emanating from said tract are not identical in spectral content at least part of the time
  - 29. The speech recognition system of claim 25 wherein said artificially excited signal, before or after tract modification or modulation, includes at least one of the following aspects:
    - (a) said artificially excited signal contains a harmonic or subharmonic of a natural formant, (b) said artificially excited signal contains phase information which is utilized in the recognizer, (c) said artificially excited signal is broadband in nature, (d) said artificially excited signal is selected or set as a function of any natural signal parameter, (e) said artificially excited signal contains tones or frequency components which interact with each other as a function of a vocal tract parameter, (f) said artificially excited signal contains at least one tone or frequency component which is modulated or modified by any portion of the vocal tract anatomy, (g) said artificially excited signal is generally inaudible to the unaided ear of a separate listener, and (h) said artificially excited signal is swept in frequency.
  - 30. The speech recognition system of claim 23 wherein said vocal tract includes at least one element selected from the group consisting of vocal chords, larynx, laryngeal valve, the glottal opening, the glottis, the arytenoids, the pharynx, the esophagus, the tongue, the pharyngeal walls, the velum, the hard palate, the alveolar ridge, the lips, teeth, gums, cheeks or any nasal cavity, at least said one element modifying or modulating said artificial excitation as the speaker articulates speech either audibly or inaudibly.
  - 31. The speech recognition system of claim 23 further including means for directing at least a first modified or modulated artificially excited speech signal to a first speech representation means which samples at least said first signal to produce a first sequence of speech representation vectors, representative at least in part, of said artificially excited signal.
  - 32. The speech recognition system of claim 31 further including means for modeling or classifying said first sequence of vectors.
  - 33. The speech recognition system of claim 23 further including a training data means capable of supporting training using at least the artificially excited speech signals.

34. A method of performing speech recognition on silently-mouthed, silently-articulated or whispered speech from a living body'"'"'s vocal tract, comprising:
- providing a source of artificial acoustic excitation;
  
  coupling said artificial acoustic excitation, directly or indirectly, into said vocal tract of a speaker;
  
  allowing said artificial acoustic excitation to be modified or modulated by said speaker'"'"'s mouthing, articulation or whispering action by a state of at least a portion of said speaker'"'"'s vocal tract; and
  
  performing speech-recognition processing on at least a portion of or component of said modified acoustic excitation to contribute to the identification of said speech or utterance.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41)
- - 35. The method of claim 34 wherein said speech is silently mouthed and any modified acoustic excitation is primarily sourced from said artificial excitation.
  - 36. The method of claim 34 wherein said speech is whispered and the modified acoustic excitation is sourced both by said artificial excitation as well as by, at least in part, natural aspiration excitation.
  - 37. The method of claim 34 wherein said speech is one of continuous speech or command-style discrete speech.
  - 38. The method of claim 34 wherein said coupling is acoustic coupling of a sonic or ultrasonic transducing device, directly or indirectly, to at least one portion of said vocal tract.
  - 39. The method of claim 34 wherein said coupling is one of tissue coupling or air-coupling.
  - 40. The method of claim 34 wherein said recognized speech or utterance is at least one of recorded, converted to text, spoken into a telephony link, or otherwise transmitted to a remote recipient.
  - 41. The method of claim 34 wherein said artificial excitation is itself inaudible at least to an external observer.

42. A method of enhancing the accuracy or speed of speech recognition of the speech or utterances emanating from a living body'"'"'s vocal tract, comprising:
- coupling artificial acoustic excitation, directly or indirectly, into said vocal tract of a speaker;
  
  allowing said speaker to audibly speak;
  
  at least during portions of said audible speech, allowing said artificial acoustic excitation to be modified or modulated by said speaker'"'"'s mouthing, articulation or whispering action by a state of at least a portion of said speaker'"'"'s vocal tract to provide an artificially excited output of said speaker; and
  
  performing speech-recognition processing on at least a portion of said artificially excited output of said speaker, to thereby provide enhanced accuracy or speed of said speech or utterance recognition.
- View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50)
- - 43. The method of claim 42 wherein said speech recognition processing is performed using at least portions of both naturally excited and artificially excited outputs of said speaker.
  - 44. The method of claim 43 wherein the acoustic output of said vocal tract containing both types of acoustic outputs is speech-recognition processed, at least in part, as a combined signal.
  - 45. The method of claim 43 wherein the acoustic output of said vocal tract containing both types of acoustic outputs is speech-recognition processed, at least in part, as separate natural and artificial signals.
  - 46. The method of claim 42 wherein said speech is one of continuous speech or command-style discrete speech or utterance.
  - 47. The method of claim 42 wherein said artificial acoustic excitation is temporally overlaid or interleaved, at least in part, with a natural tract excitation.
  - 48. The method of claim 42 wherein said artificial acoustic excitation is applied using feedback information relating to a state of a natural excitation or of an articulatory position or state.
  - 49. The method of claim 42 wherein said artificially excited acoustic output is recognition-processed when naturally-produced acoustic output is determined insufficient to alone identify said speech or utterance with a desired accuracy or speed.
  - 50. The method of claim 42 wherein an artificial acoustic excitation is triggered by a state of natural excitation or a state or an articulator or vocal tract element.

51. A method of minimizing degradation in the accuracy or speed of speech-recognition of a first speaker'"'"'s speech or utterance caused by at least one second interfering background speaker or voice comprising:
- coupling artificial acoustic excitation, directly or indirectly, into the vocal tract of the first speaker;
  
  allowing said first speaker to audibly speak in the potential acoustic presence of said at least one second background speaker, thereby modifying or modulating said first speaker'"'"'s artificial acoustic excitation as well as said first speaker'"'"'s natural excitation; and
  
  processing at least a portion of said first speaker'"'"'s artificially-produced acoustic output by a speech recognition means;
  
  wherein said first speaker'"'"'s output is known to be that of said first speaker due to its identifiable artificial acoustic content;
  
  or wherein said second speakers interfering output is ignored or rejected because it does not contain first speakers identifying artificial excitations.
- View Dependent Claims (52, 53, 54, 55, 56, 57)
- - 52. The method of claim 51 wherein at least two said equipped speakers are one of (a) speaking as part of a conversing group of at least two or (b) speaking to each other locally or from remote locations.
  - 53. The method of claim 51 wherein speech recognition means process at least portions of both naturally-excited and artificially-excited output of said speaker.
  - 54. The method of claim 53 wherein temporally and/or spectrally unique artificial excitations are provided to two or more thus-equipped speakers such that all such equipped speakers may speak and be recognized without recognition-interference with each other, said unique excitations associable with particular speakers
  - 55. The method of claim 54 wherein a thus-equipped speaker'"'"'s recognition system is arranged to ignore or reject inputs containing modifications of, modulations of, or elements of a potentially interfering speaker'"'"'s different artificial excitation.
  - 56. The method of claim 54 wherein a computer provides or assigns said unique artificial excitations.
  - 57. The method of claim 56 wherein information regarding at least one unique artificial excitation, or assignment thereof, is delivered by one of a computer network, telecommunications network, wireless signal, or is inputted manually or via speech-input

58. A method of providing a speech-recognition based security function for user identification or validation comprising:
- (a) coupling, directly or indirectly, an artificial acoustic exciter into a user'"'"'s vocal tract;
  
  (b) having the user speak, articulate or mouth an utterance wherein said utterance, at least in part, comprises a portion of the artificial excitation as-modified or modulated by said user'"'"'s vocal tract;
  
  (c) applying speech recognition processing means to identify or validate said user, said means processing at least a portion of said artificially excited speech, utterance or signal-representation thereof; and
  
  (d) storing information relating to at least one characteristic of said user'"'"'s vocal tract, or of its function, being used in said user identification or validation process.
- View Dependent Claims (59)
- - 59. The method of claim 58 wherein said user speaks or utters at least one designated entry-utterance for the purpose of said identification or validation, said audible or inaudible entry-utterance comprising at least one of:
    - (a) including at least a portion of said user'"'"'s name or alias;
      
      (b) including a welcoming greeting;
      
      (c) being revealed to said user only at the time of attempted entry; and
      
      (d) being revealed to said user after its random selection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Carol A. Tosaya, John W. Sliwa Jr
Original Assignee
Carol A. Tosaya, John W. Sliwa Jr
Inventors
Tosaya, Carol A., Sliwa, John W. Jr.

Granted Patent

US 7,082,395 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/261
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 19/08 Determination or coding of ...

Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

59 Claims

Specification

Solutions

Use Cases

Quick Links

Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

59 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links