Various apparatus and methods for a speech recognition system
First Claim
1. A continuous speech recognition engine, comprisingan input subsystem configured to convert input audio data into a time coded sequence of sound feature frames for speech recognition;
- a fine speech recognizer to apply a speech recognition process to the sound feature frames and determine at least a candidate recognized word that corresponds to the sound feature frames;
a coarse sound representation generator to output a series of individual phonemes occurring within a time duration of the recognized word as a coarse sound representation of the recognized word; and
at least one processor to;
compare the coarse sound representation of the recognized word to a known sound of the recognized word in a database, andassign a confidence level parameter to the recognized word from the fine speech recognizer according to the comparing.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, apparatus, and system are described for a continuous speech recognition engine that includes a fine speech recognizer model, a coarse sound representation generator, and a coarse match generator. The fine speech recognizer model receives a time coded sequence of sound feature frames, applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames. The coarse sound representation generator generates a coarse sound representation of the recognized word. The coarse match generator determines a likelihood of the coarse sound representation actually being the recognized word based on comparing the coarse sound representation of the recognized word to a database containing the known sound of that recognized word and assigns the likelihood as a robust confidence level parameter to that recognized word.
37 Citations
23 Claims
-
1. A continuous speech recognition engine, comprising
an input subsystem configured to convert input audio data into a time coded sequence of sound feature frames for speech recognition; -
a fine speech recognizer to apply a speech recognition process to the sound feature frames and determine at least a candidate recognized word that corresponds to the sound feature frames; a coarse sound representation generator to output a series of individual phonemes occurring within a time duration of the recognized word as a coarse sound representation of the recognized word; and at least one processor to; compare the coarse sound representation of the recognized word to a known sound of the recognized word in a database, and assign a confidence level parameter to the recognized word from the fine speech recognizer according to the comparing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for speech recognition, comprising:
-
converting, by a system having a processor, audio data into a time coded sequence of sound feature frames for speech recognition; receiving, by the system, the time coded sequence of sound feature frames and applying a speech recognition process of a first speech recognizer to the sound feature frames to determine at least one candidate recognized word that corresponds to the sequence of sound feature frames; generating, by a coarse sound representation generator in the system, a coarse sound representation that contains a series of individual phonemes occurring within a time duration of the recognized word; and comparing, by the system, the coarse sound representation to a known sound of the recognized word in a database and then assigning a confidence level parameter to the recognized word produced by the first speech recognizer based on the comparison. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer readable storage medium storing instructions that upon execution cause a system to:
-
convert audio data into a time coded sequence of sound feature frames for speech recognition; receive the time coded sequence of sound feature frames and apply a speech recognition process of a first speech recognizer to the sound feature frames to determine at least one candidate recognized word that corresponds to the sequence of sound feature frames; generate, using a coarse sound representation generator, a coarse sound representation of the recognized word that contains a series of individual phonemes occurring within a time duration of the recognized word; and compare the coarse sound representation to a known sound of the recognized word in a database and then assign a confidence level parameter to the recognized word produced by the first speech recognizer based on the comparison. - View Dependent Claims (22, 23)
-
Specification