SYSTEM AND METHOD FOR BUILDING AND EVALUATING AUTOMATIC SPEECH RECOGNITION VIA AN APPLICATION PROGRAMMER INTERFACE
First Claim
1. A method of generating speech models for a remote client, the method comprising:
- receiving, at a network-based automatic speech recognition system, feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the automatic speech recognition system;
processing, at the automatic speech recognition system, the inputs to train an acoustic model and a language model; and
transmitting the acoustic model and the language model to the network client.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client'"'"'s proprietary feature extraction.
-
Citations
20 Claims
-
1. A method of generating speech models for a remote client, the method comprising:
-
receiving, at a network-based automatic speech recognition system, feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the automatic speech recognition system; processing, at the automatic speech recognition system, the inputs to train an acoustic model and a language model; and transmitting the acoustic model and the language model to the network client. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A client device for interfacing with a system that generates models for use in automatic speech recognition via an application programming interface call over a network, the client device comprising:
-
a processor; a first module configured to control the processor to receive input speech and input text; a second module configured to control the processor to extract features from the input speech and the input text based on configuration parameters; a third module configured to control the processor to transmit, via the application programmer interface call, the features, the input speech, the input text, and configuration parameter values to the system; and a fourth module configured to control the processor to receive from the automatic speech recognition system an acoustic model and a language model generated based on the features, the input speech, the input text, and the configuration parameter values. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium storing instructions which, when executed by a network-based computing device, cause the computing device to provide an application programming interface for client access to the network-based computing device for generating speech models, the instructions comprising:
-
receiving, via a call to the application programming interface, feature streams, transcriptions, and parameter values as inputs from a client device, wherein the application programming interface hides internal operations of generating speech models from the client device; processing the feature streams and transcription according to the parameter values to train an acoustic model and a language model; generating a log describing at least part of the processing without revealing the internal operations of generating speech models; and transmitting the acoustic model, the language model, and the log to the network client in response to the call. - View Dependent Claims (18, 19, 20)
-
Specification