Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition
First Claim
1. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom, said speech recognition system including:
- means for representation, modeling or classification or both, and searching of artificially excited speech signals or signal components;
means for representation, modeling or classification or both, and searching of naturally excited speech signals or signal components;
at least one said searching means having access to at least one of an acoustic model, lexical model or language model;
at least one training means; and
means for directing at least a first modified or modulated artificially excited speech signal to a first speech representation means which samples at least said first signal to produce a first sequence of speech representation vectors, representative at least in part, of said artificially excited signal, wherein both the artificially excited signal and the naturally excited signal are represented by a single set of representation vectors.
0 Assignments
0 Petitions
Accused Products
Abstract
A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user'"'"'s similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.
69 Citations
32 Claims
-
1. A speech recognition system for processing sounds emanating from a living body'"'"'s vocal tract, said sounds including sounds excited by at least one artificial exciter coupled, either directly or indirectly, into said vocal tract to introduce artificial excitations, said at least one artificial excitation modified or modulated by said vocal tract and emanating therefrom, said speech recognition system including:
-
means for representation, modeling or classification or both, and searching of artificially excited speech signals or signal components;
means for representation, modeling or classification or both, and searching of naturally excited speech signals or signal components;
at least one said searching means having access to at least one of an acoustic model, lexical model or language model;
at least one training means; and
means for directing at least a first modified or modulated artificially excited speech signal to a first speech representation means which samples at least said first signal to produce a first sequence of speech representation vectors, representative at least in part, of said artificially excited signal, wherein both the artificially excited signal and the naturally excited signal are represented by a single set of representation vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method of minimizing degradation in the accuracy or speed of speech-recognition of a first speaker'"'"'s speech or utterance caused by at least one second interfering background speaker, voice, or sound, said method comprising:
-
coupling artificial acoustic excitation, directly or indirectly, into the vocal tract of the first speaker;
allowing said first speaker to audibly speak in the potential acoustic presence of said at least one second background speaker or sound, thereby modifying or modulating said first speaker'"'"'s artificial acoustic excitation as well as said first speaker'"'"'s natural excitation; and
processing at least a portion of said first speaker'"'"'s artificially-produced acoustic output by a speech recognition means, said speech recognition means comprising;
means for representation, modeling or classification, and searching of artificially excited speech signals or signal components;
means for representation, modeling or classification, and searching of naturally excited speech signals or signal components;
at least one of said searching means having access to at least one of an acoustic model, lexical model or language model; and
at least one training means;
wherein said first speaker'"'"'s output is known to be that of said first speaker due to its identifiable artificial acoustic content, or wherein said second speaker'"'"'s or sound'"'"'s interfering output is ignored or rejected because it does not contain first speakers identifying artificial excitations. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29)
choosing said at least one artificial excitation based on an optimized correlation between it and known words or utterances made available during training.
-
-
30. A method of providing a speech-recognition based security function for user identification or validation comprising:
-
(a) coupling, directly or indirectly, an artificial acoustic exciter into a user'"'"'s vocal tract;
(b) having the user speak, articulate or mouth an utterance wherein said utterance, at least in part, comprises a portion of the artificial excitation as-modified or modulated by said user'"'"'s vocal tract;
(c) applying speech recognition processing means to identify or validate said user, said means processing at least a portion of said artificially excited speech, utterance or signal-representation thereof; and
(d) storing information relating to at least one characteristic of said user'"'"'s vocal tract, or of its function, being used in said user identification or validation process, wherein said speech-recognition processing includes processing said modified acoustic excitation through representation, modeling or classification or both, and searching to produce identified words. - View Dependent Claims (31, 32)
(a) including at least a portion of said user'"'"'s name or alias;
(b) including a welcoming greeting;
(c) being revealed to said user only at the time of attempted entry; and
(d) being revealed to said user after its random selection. (4) Improves security for speech-based user-identification or user-validation.
-
-
32. The method of claim 30 further comprising:
choosing said at least one artificial excitation based on an optimized correlation between it and known words or utterances made available during training.
Specification