System and method for speech recognition services

US 7,711,568 B2
Filed: 04/03/2003
Issued: 05/04/2010
Est. Priority Date: 04/03/2003
Status: Active Grant

First Claim

Patent Images

1. A method of processing speech data received from a mobile device, the method comprising:

receiving at a speech server a speech request from a mobile device to transmit an audio segment;

notifying a session object communicating with the mobile device regarding the arrival of the audio segment;

generating from the session object a handler to process the audio segment, the handler acquiring a decoder proxy for the audio segment from a decoder proxy cache;

obtaining an automatic speech recognition (ASR) decoder result associated with the audio segment, the ASR decoder result being passed to the decoder proxy;

communicating a recognized phrase associated with the ASR decoder result or a failure code from the decoder proxy to the handler; and

issuing from the handler a query to a web server using the ASR decoder result.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A digital speech enabled middleware module is disclosed that facilitates interaction between a large number of client devices and network-based automatic speech recognition (ASR) resources. The module buffers feature vectors associated with speech received from the client devices when the number of client devices is greater than the available ASR resources. When an ASR decoder becomes available, the module transmits the feature vectors to the ASR decoder and a recognition result is returned.

Citations

28 Claims

1. A method of processing speech data received from a mobile device, the method comprising:
- receiving at a speech server a speech request from a mobile device to transmit an audio segment;
  
  notifying a session object communicating with the mobile device regarding the arrival of the audio segment;
  
  generating from the session object a handler to process the audio segment, the handler acquiring a decoder proxy for the audio segment from a decoder proxy cache;
  
  obtaining an automatic speech recognition (ASR) decoder result associated with the audio segment, the ASR decoder result being passed to the decoder proxy;
  
  communicating a recognized phrase associated with the ASR decoder result or a failure code from the decoder proxy to the handler; and
  
  issuing from the handler a query to a web server using the ASR decoder result.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of processing speech data of claim 1, wherein the speech server receives the speech request using a custom protocol.
  - 3. The method of processing speech data of claim 1, wherein a speech server dispatcher notifies the session object regarding the arrival of the audio segment.
  - 4. The method of processing speech data of claim 1, wherein the session object further analyzes the audio segment to determine a type of application-specific handler to generate.
  - 5. The method of processing speech data of claim 4, wherein the session object further is used as a repository for any client states that span a duration of a mobile device session.
  - 6. The method of processing speech data of claim 5, wherein the session object further stores transient acoustic information.

7. A method of processing speech data received from a mobile device, the method comprising:
- receiving at a server dispatcher a speech request from a mobile device to transmit an audio segment;
  
  notifying, by the server dispatcher, a session object communicating with the mobile device regarding the arrival of the audio segment; and
  
  generating by the session object a handler to process the audio segment, the handler attempting to acquiring a decoder proxy for processing the audio segment from a decoder proxy cache, whereinif the handler is successful in its attempt to acquire the decoder proxy, then the handler streams computed cepstrum vectors to a decoding process; and
  
  if the handler is not successful in its attempt acquire the decoder proxy, then the handler buffers the computer cepstrum vectors and transmits them as soon as a decoder proxy becomes available.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. The method of claim 7, further comprising, after decoding the audio segment, returning control from the handler to the server dispatcher for servicing other mobile devices.
  - 9. The method of claim 8, wherein the server dispatcher manages speech requests from multiple mobile devices on a speech segment-by-speech segment basis.
  - 10. The method of claim 7, further comprising:
    - notifying the server dispatcher of a recognition result from the decoder process;
      
      transmitting a decoder result to the associated decoder proxy;
      
      transmitting a recognized phrase from the decoded result from the decoder proxy to the handler; and
      
      using the recognized phrase to query an application server.
  - 11. The method of claim 7, further comprising:
    - applying at least one acoustic compensation algorithm to the audio segment.
  - 12. The method of claim 11, wherein the at least one acoustic compensation algorithm comprises frequency warping-based speaker normalization.
  - 13. The method of claim 11, wherein the at least one acoustic compensation algorithm comprises constrained model adaptation.
  - 14. The method of claim 11, wherein the at least one acoustic compensation algorithm comprises speaker adaptive training.
  - 15. The method of claim 11, wherein the at least one acoustic compensation algorithm comprises cepstrum and variance normalization.

16. A system for managing speech segments received from a plurality of mobile devices, the system comprising:
- a server dispatcher that detects and routes all system I/O events;
  
  a decoder proxy cache containing a plurality of decoder proxies that are local representations of decoder processes;
  
  a session object that processes communications from each of the plurality of mobile devices, the session object being notified by the dispatcher when one of the plurality of mobile devices has initiated a speech request;
  
  a handler initiated by the session object, the handler acquiring one of the plurality of decoder proxies to process a speech segment associated with the speech request; and
  
  a decoder processing module that receives the speech segment from the acquired decoder proxy and returns an automatic speech recognition (ASR) result to the acquired decoder proxy, wherein the decoder proxy transmits the ASR result to the handler for use in querying an application server, whereupon the handler passes control of further speech requests from the plurality of mobile devices back to the server dispatcher.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 17. The system for managing speech segments of claim 16, wherein the handler further performs cepstrum feature analysis on the speech segment.
  - 18. The system for managing speech segments of claim 17, wherein the handler further performs acoustic feature space normalizations and transformations on the speech segment.
  - 19. The system for managing speech segments of claim 18, wherein parameters used for the acoustic feature space normalizations and transformations on the speech segment are estimated using speech obtained from previous utterances spoken by the user of the mobile device.
  - 20. The system for managing speech segments of claim 19, wherein approximately from one second to one minute of speech obtained by the user of the mobile device is used.
  - 21. The system for managing speech segments of claim 18, wherein the acoustic feature space normalization process further comprises frequency warping based speaker normalization.
  - 22. The system for managing speech segments of claim 21, wherein the frequency warping-based speaker normalization comprises selecting a single linear warping function using adaptation utterances for a given speaker to maximize the likelihood of the adaptation speech with respect to a hidden Markov model (HMM).
  - 23. The system for managing speech segments of claim 22, wherein during speech recognition by the decoder module, a warping factor is retrieved and applied to scaling a frequency axis in mel-frequency cepstrum coefficient (MFCC) based feature analysis.
  - 24. The system for managing speech segments of claim 18, wherein acoustic feature space normalization further comprises applying cepstrum mean normalization and cepstrum variance normalization.
  - 25. The system for managing speech segments of claim 18, wherein acoustic feature space normalization further comprises computing normalization vectors from adaptation utterances for each speaker and using the vectors to initialize estimates of normalization vectors of each speech segment.
  - 26. The system for managing speech segments of claim 22, wherein during speech recognition by the decoder module, a warping factor is retrieved and applied to scaling a frequency axis in mel-frequency cepstrum coefficient (MFCC) based feature analysis.

27. A method of recognizing speech from at least one mobile device, the method comprising:
- receiving at a speech server a speech request from a mobile device to transmit an audio segment;
  
  initiating a handler to process the audio segment, the handler applying acoustic algorithms in feature space to the audio segment;
  
  acquiring a decoder proxy for the audio segment from a decoder proxy cache;
  
  obtaining an automatic speed recognition (ASR) decoder result associated with the audio segment, the ASR decoder result being passed to the handler via the decoder proxy; and
  
  issuing from the handler a query to a web server using the ASR decoder result.
- View Dependent Claims (28)
- - 28. The method of claim 27, wherein acoustic compensation parameters used in the acoustic algorithms applied by the handler are estimated off-line from adaptation utterances.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Parthasarathy, Sarangarajan, Arizmendi, Iker, Rose, Richard Cameron
Primary Examiner(s)
Armstrong, Angela A

Application Number

US10/406,368
Publication Number

US 20040199393A1
Time in Patent Office

2,588 Days
Field of Search

704/270, 704/270.1
US Class Current

704/270.1
CPC Class Codes

G10L 15/285 Memory allocation or algori...

G10L 15/30 Distributed recognition, e....

System and method for speech recognition services

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for speech recognition services

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links