Method and system for hybrid decoding for enhanced end-user privacy and low latency
First Claim
1. A computer-implemented method of performing automatic speech recognition, the method implemented by a processor executing program instructions stored in memory and comprising:
- receiving speech recognition result candidates from a user device, the received speech recognition result candidates generated by performing speech recognition on one or more frames of audio data on the user device, the speech recognition on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models capable of being stored on the user device, wherein the two parallel layers include a command-oriented finite state (FST) based recognizer layer and a Statistical Language Model (SLM) based pre-filter layer configured to utilize device information local to the user device;
performing speech recognition using the received speech recognition result candidates; and
transmitting to the user device, via a communications network, results of the speech recognition performed using the received speech recognition result candidates.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods described herein provide functionality for automatic speech recognition (ASR). One such embodiment performs speech recognition using received speech recognition result candidates, where the received candidates were generated by performing Statistical Language Model (SLM) based speech recognition on one or more frames of audio data. In turn, such an embodiment transmits results of the speech recognition, performed using the received speech recognition result candidates, to a user device via a communications network. Results of the speech recognition are available with lower latency than pure cloud based ASR solutions.
29 Citations
20 Claims
-
1. A computer-implemented method of performing automatic speech recognition, the method implemented by a processor executing program instructions stored in memory and comprising:
-
receiving speech recognition result candidates from a user device, the received speech recognition result candidates generated by performing speech recognition on one or more frames of audio data on the user device, the speech recognition on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models capable of being stored on the user device, wherein the two parallel layers include a command-oriented finite state (FST) based recognizer layer and a Statistical Language Model (SLM) based pre-filter layer configured to utilize device information local to the user device; performing speech recognition using the received speech recognition result candidates; and transmitting to the user device, via a communications network, results of the speech recognition performed using the received speech recognition result candidates. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer system for performing automatic speech recognition, the computer system comprising:
-
a processor; and a memory with computer code instructions stored thereon, the processor and the memory, with the computer code instructions being configured to cause the system to; receive speech recognition result candidates from a user device, the received speech recognition result candidates generated by performing speech recognition on one or more frames of audio data on the user device, the speech recognition on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models capable of being stored on a user device, wherein the two parallel layers include a command-oriented finite state (FST) based recognizer layer and a Statistical Language Model (SLM) based pre-filter layer configured to utilize device information local to the user device; perform speech recognition using the received speech recognition result candidates; and transmit to the user device, via a communications network, results of the speech recognition performed using the received speech recognition result candidates. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program product for performing automatic speech recognition, the computer program product comprising:
one or more non-transitory computer-readable tangible storage devices and program instructions stored on at least one of the one or more tangible storage devices, the program instructions, when loaded and executed by a processor, cause an apparatus associated with the processor to; receive speech recognition result candidates from a user device, the received speech recognition result candidates generated by performing speech recognition on one or more frames of audio data on the user device, the speech recognition on the one or more frames of audio data combining two parallel layers of speech recognition that utilize language models capable of being stored on a user device, wherein the two parallel layers include a command-oriented finite state (FST) based recognizer layer and a Statistical Language Model (SLM) based pre-filter layer configured to utilize device information local to the user device; perform speech recognition using the received speech recognition result candidates; and transmit to the user device, via a communications network, results of the speech recognition performed using the received speech recognition result candidates.
Specification