SYSTEM AND METHOD FOR BUILDING AND EVALUATING AUTOMATIC SPEECH RECOGNITION VIA AN APPLICATION PROGRAMMER INTERFACE

US 20120130709A1
Filed: 11/23/2010
Published: 05/24/2012
Est. Priority Date: 11/23/2010
Status: Active Grant

First Claim

Patent Images

1. A method of generating speech models for a remote client, the method comprising:

receiving, at a network-based automatic speech recognition system, feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the automatic speech recognition system;

processing, at the automatic speech recognition system, the inputs to train an acoustic model and a language model; and

transmitting the acoustic model and the language model to the network client.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client'"'"'s proprietary feature extraction.

Citations

20 Claims

1. A method of generating speech models for a remote client, the method comprising:
- receiving, at a network-based automatic speech recognition system, feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the automatic speech recognition system;
  
  processing, at the automatic speech recognition system, the inputs to train an acoustic model and a language model; and
  
  transmitting the acoustic model and the language model to the network client.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising generating a log of the processing.
  - 3. The method of claim 2, further comprising transmitting the log to the network client.
  - 4. The method of claim 1, wherein the inputs are received via an application programming interface call.
  - 5. The method of claim 1, further comprising modifying how the inputs are processed based on an intervention from a human expert.
  - 6. The method of claim 1, wherein the inputs further comprise a feature stream.
  - 7. The method of claim 1, wherein processing the inputs is based on an algorithm for at least one of estimating an acoustic model, adapting an acoustic model, estimating a language model, generating recognizer outputs, and accuracy evaluation.

8. A client device for interfacing with a system that generates models for use in automatic speech recognition via an application programming interface call over a network, the client device comprising:
- a processor;
  
  a first module configured to control the processor to receive input speech and input text;
  
  a second module configured to control the processor to extract features from the input speech and the input text based on configuration parameters;
  
  a third module configured to control the processor to transmit, via the application programmer interface call, the features, the input speech, the input text, and configuration parameter values to the system; and
  
  a fourth module configured to control the processor to receive from the automatic speech recognition system an acoustic model and a language model generated based on the features, the input speech, the input text, and the configuration parameter values.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The system of claim 8, wherein the configuration parameter values comprise a specific task for the language model.
  - 10. The system of claim 8, wherein the third module is further configured to control the processor to transmit at least one of the features, the input speech, the input text, and configuration parameter values via a secured connection.
  - 11. The system of claim 10, wherein the secured connection is encrypted.
  - 12. The system of claim 8, further comprising a fifth module configured to control the processor to establish a contractual agreement regarding privacy of at least one of the features, the input speech, the input text, and the configuration parameter values.
  - 13. The system of claim 8, wherein the fourth module is further configured to control the processor to receive a log associated with the acoustic model and the language model.
  - 14. The system of claim 13, wherein the log describes events associated with creation of the acoustic model and the language model.
  - 15. The system of claim 8, wherein the automatic speech recognition system further modifies how the input is processed based on an intervention from a human expert.
  - 16. The system of claim 8, wherein the automatic speech recognition system further processes the input based on an algorithm for at least one of estimating an acoustic model, adapting an acoustic model, estimating a language model, generating recognizer outputs, and accuracy evaluation.

17. A non-transitory computer-readable storage medium storing instructions which, when executed by a network-based computing device, cause the computing device to provide an application programming interface for client access to the network-based computing device for generating speech models, the instructions comprising:
- receiving, via a call to the application programming interface, feature streams, transcriptions, and parameter values as inputs from a client device, wherein the application programming interface hides internal operations of generating speech models from the client device;
  
  processing the feature streams and transcription according to the parameter values to train an acoustic model and a language model;
  
  generating a log describing at least part of the processing without revealing the internal operations of generating speech models; and
  
  transmitting the acoustic model, the language model, and the log to the network client in response to the call.
- View Dependent Claims (18, 19, 20)
- - 18. The non-transitory computer-readable storage medium of claim 17, wherein the parameter values comprise a specific task for the language model.
  - 19. The non-transitory computer-readable storage medium of claim 17, wherein at least one of the feature streams, the transcriptions, and parameter values is received via a secured connection.
  - 20. The non-transitory computer-readable storage medium of claim 17, the instructions further comprising establishing a contractual agreement with the network client regarding privacy of at least one of the feature streams, the transcriptions, and parameter values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Schroeter, Horst J., BOCCHIERI, Enrico, Dimitriadis, Dimitrios

Granted Patent

US 9,484,018 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/201
CPC Class Codes

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/30   Distributed recognition, e....

SYSTEM AND METHOD FOR BUILDING AND EVALUATING AUTOMATIC SPEECH RECOGNITION VIA AN APPLICATION PROGRAMMER INTERFACE

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR BUILDING AND EVALUATING AUTOMATIC SPEECH RECOGNITION VIA AN APPLICATION PROGRAMMER INTERFACE

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links