CONTEXT SENSITIVE MULTI-STAGE SPEECH RECOGNITION
First Claim
10. A method of enrolling a voice segment comprising:
- detecting a speech signal representing a verbal utterance;
digitizing the detected speech signal;
generating a phonetic representation of the speech signal that is designated a first recognition result;
generating variants of the phonetic representation based on a plurality of context information provided for the phonetic representation;
selecting one or more variants of the phonetic representation that is designated a second recognition result;
matching the second recognition result with stored phonetic representations of entries of one or more stored lexical lists; and
adding the second recognition result to the stored phonetic representations when none of the phonetic representations of the entries of the one or more stored lexical list match the second recognition result better than a predetermined matching threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A system enables devices to recognize and process speech. The system includes a database that retains one or more lexical lists. A speech input detects a verbal utterance and generates a speech signal corresponding to the detected verbal utterance. A processor generates a phonetic representation of the speech signal that is designated a first recognition result. The processor generates variants of the phonetic representation based on context information provided by the phonetic representation. One or more of the variants of the phonetic representation selected by the processor are designated as a second recognition result. The processor matches the second recognition result with stored phonetic representations of one or more of the stored lexical lists.
32 Citations
21 Claims
-
10. A method of enrolling a voice segment comprising:
-
detecting a speech signal representing a verbal utterance; digitizing the detected speech signal; generating a phonetic representation of the speech signal that is designated a first recognition result; generating variants of the phonetic representation based on a plurality of context information provided for the phonetic representation; selecting one or more variants of the phonetic representation that is designated a second recognition result; matching the second recognition result with stored phonetic representations of entries of one or more stored lexical lists; and adding the second recognition result to the stored phonetic representations when none of the phonetic representations of the entries of the one or more stored lexical list match the second recognition result better than a predetermined matching threshold. - View Dependent Claims (11)
-
-
12. A computer-readable storage medium that stores instructions that, when executed by processor, cause the processor to recognize speech by executing software that causes the following act comprising:
-
digitizing a speech signal representing a verbal utterance; generating a phonetic representation of the speech signal that is designated as a first recognition result; generating variants of the phonetic representation based on a plurality of context information provided for the phonetic representation; selecting one or more variants of the phonetic representation that is designated as a second recognition result; matching the second recognition result with stored phonetic representations of entries of one or more stored lexical lists; and adding the second recognition result to the stored phonetic representations when the phonetic representations of the entries of the one or more stored lexical list do not match the second recognition result better than a predetermined matching threshold. - View Dependent Claims (13)
-
-
14. Speech recognition device, comprising
a database comprising one or more lexical lists; -
a speech input interface configured to detect a verbal utterance and configured to generate a speech signal corresponding to the detected verbal utterance; a processor programmed to; generate a phonetic representation of the speech signal as a first recognition result; generate variants of the phonetic representation based on context information provided for the phonetic representation; select one or more of the variants of the phonetic representation as a second recognition result; and match the second recognition result with stored phonetic representations of entries of the one or more stored lexical lists. - View Dependent Claims (1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 16, 17, 18)
-
-
18-1. The speech recognition system of claim 14, where the processor is further configured to divide the verbal utterance into intervals.
-
20. Speech recognition device, comprising
a database comprising one or more lexical lists; -
a speech input interface configured to detect a verbal utterance and configured to generate a speech signal corresponding to the detected verbal utterance; a signal processor in communication with the speech input to beamform the verbal utterance received through a sensor array; and a processor programmed to; generate a phonetic representation of the speech signal as a first recognition result; generate variants of the phonetic representation based on context information provided for the phonetic representation; select one or more of the variants of the phonetic representation as a second recognition result; and match the second recognition result with stored phonetic representations of entries of at least one stored lexical list. - View Dependent Claims (21)
-
Specification