Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition
First Claim
1. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom, said speech recognition system including:
- means for representation, modeling or classification or both, and searching of artificially excited speech signals or signal components;
means for representation, modeling or classification or both, and searching of naturally excited speech signals or signal components;
at least one of said searching means having access to at least one of an acoustic model, lexical model or language model; and
at least one training means.
0 Assignments
0 Petitions
Accused Products
Abstract
A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user'"'"'s similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.
56 Citations
64 Claims
-
1. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom, said speech recognition system including:
-
means for representation, modeling or classification or both, and searching of artificially excited speech signals or signal components; means for representation, modeling or classification or both, and searching of naturally excited speech signals or signal components; at least one of said searching means having access to at least one of an acoustic model, lexical model or language model; and at least one training means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method of performing speech recognition on silently-mouthed, silently-articulated or whispered speech or unterances from a living body'"'"'s vocal tract, comprising:
-
providing a source of artificial acoustic excitation; coupling said artificial acoustic excitation, directly or indirectly, into said vocal tract of a speaker; allowing said artificial acoustic excitation to be modified or modulated by said speaker'"'"'s mouthing, articulation or whispering action by a state of at least a portion of said speaker'"'"'s vocal tract; performing speech-recognition processing on at least a portion of or component of said modified acoustic excitation to contribute to the identification of said speech or utterance; and choosing said at least one artificial excitation based on an optimized correlation between it and known words or utterances made available during training, wherein said speech-recognition processing includes processing said modified acoustic excitation through representation, modeling or classification or both, and searching to produce identified words or utterances. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
-
-
32. A method of enhancing the accuracy or speed of speech recognition of the speech or utterances emanating from a living body'"'"'s vocal tract, comprising:
-
coupling artificial acoustic excitation, directly or indirectly, into said vocal tract of a speaker; allowing said speaker to audibly speak; at least during portions of said audible speech, allowing said artificial acoustic excitation to be modified or modulated by said speaker'"'"'s mouthing, articulation or whispering action by a state of at least a portion of said speaker'"'"'s vocal tract to provide an artificially excited output of said speaker; and performing speech-recognition processing using at least portions of both naturally excited and artificially excited outputs of said speaker, to thereby provide enhanced accuracy or speed of said speech or utterance recognition;
wherein said speech-recognition processing includes processing said modified acoustic excitation through representation, modeling or classification or both, and searching to produce identified words or utterances and wherein the acoustic output of said vocal tract containing both types of acoustic outputs is speech-recoanition processed, at least in part, as separate natural and artificial signals. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39)
-
-
40. A system for improving the accuracy, robustness, or speed of the speech-recognition of utterances made by a speaker comprising:
-
a subject speaker emanating at least one naturally audible utterance from his/her vocal tract; an artificial acoustic excitation means for speech-recognition enhancement of natural and audible speech and utterances coupled, either directly or indirectly, into said speaker'"'"'s vocal tract, without interfering with or modifying breathing or aspiration, for use at least in healthy vocal tracts; the excitation means operated by said system to excite at least one additional of (a) an artificial audible emanation from the speakers modulating tract and (b) an artificial inaudible emanation from the speakers modulating tract; the system utilizing both the natural audible signal and at least one of the additional artificial audible or artificial inaudible signals to perform speech recognition; the system having representation means and at least one of modeling means and classification means, a searching means, a training means and a directly or indirectly tract-coupled exciter; and the system being teachable by the speaker. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53)
-
-
54. A system for providing speech or utterance-recognition processing of improved accuracy or robustness, said speech or utterances comprising one or more of audible or inaudible (a) speech, (b) commands, (c) whispers, (d) mouthings or sounds, said system comprising:
-
a speech recognition system capable of recognizing at least one natural or artificially induced audible or inaudible acoustic emanation emanating from a speaker'"'"'s vocal tract, said speech recognition system including an artificial exciter for speech-recognition enhancement of natural audible and utterances that does not interfere with or modify breathing or aspiration, for use at least in healthy vocal tracts, said speech recognition system further including representation means and at least one of modeling means and classification means, a searching means, a training means and a directly or indirectly tract-coupled exciter, and a vocal tract acoustic-probing means which measures at least one of (a) an acoustic impedance or admittance, (b) an acoustic resonance or harmonic, (c) a vocal tract spectral fingerprint, wherein a combination of the first speech recognition process and at least one second acoustic measurement process provide complimentary information allowing for a more accurate or robust overall speech recognition process. - View Dependent Claims (55, 56)
-
-
57. An arrangement for improving the accuracy, robustness, or speed of the speech-recognition of natural audible speech and utterances made by a speaker, said arrangement comprising:
-
(a) a teachable speech recognition system comprising at least a speech sampling means and a search means, said system capable of detecting modulations or modulation activity of said speaker'"'"'s vocal tract and relating said modulations to learned utterances; and (b) an artificial acoustic excitation means and an associated artificial acoustic reception means, represented collectively by one or more body-contacting, skin contacting or bone contacting transducers for speech-recognition enhancement of natural audible speech and utterances that does not interfere with or modify breathing or aspiration, for use at least in healthy vocal tracts; wherein; (1) the artificial acoustic excitation is transmitted directly or indirectly into the speakers vocal tract and is modulated by said utterances; (2) said modulated artificial excitations being detected by at least one said contacting transducer reception means; and (3) said speech recognition system utilizing at least said artificial modulated excitations and representation means, at least one of modeling means and classification means, a training means and a directly or indirectly tract-coupled exciter for recognition processing. - View Dependent Claims (58)
-
-
59. An arrangement for providing speech or utterance-recognition processing of improved accuracy or robustness, said speech or utterances comprising one or more of audible or inaudible (a) speech, (b) commands, (c) whispers, (d) mouthings or sounds, said arrangement comprising:
-
(a) a speech recognition system capable of recognizing at least one of natural or artificially induced audible or inaudible acoustic modulations of a speaker'"'"'s vocal tract, and (b) at least one artificial acoustic emitter, exciter or transducer and at least one acoustic receiver or pickup, or a single acoustic device serving both emitter/exciter and receiver functions, the artificial emitter, exciter or transducer capable of speech-recognition enhancement of natural audible and utterances that do not interfere with or modify breathing or aspiration, for use at least in healthy vocal tracts, wherein; (1) at least one of said acoustic devices is skin or bone coupled, (2) the emitter/exciter exciting artificial audible or inaudible acoustics which are modulated, at least in part, by the speaker'"'"'s vocal tract, (3) the receiver detecting said modulated artificial acoustics and making them available in support of recognition-processing by the speech recognition system, and (4) the system being taught to recognize using utterances which comprise, at least in part, said modulated artificial acoustics, the system further including representation means and at least one of modeling means and classification means, a searching means, a training means and a directly or indirectly tract-coupled exciter. - View Dependent Claims (60, 61, 62, 63, 64)
-
Specification