Techniques to provide a standard interface to a speech recognition platform

US 9,570,078 B2
Filed: 06/19/2009
Issued: 02/14/2017
Est. Priority Date: 06/19/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

accepting a speech recognition request via an application program interface (API), the request comprising an audio input and parameters including a uniform resource identifier (URI) link to a length of silence to observe, a grammar, and a grammar weight;

performing speech recognition on the audio input according to the request; and

upon observing the length of silence in the audio input, returning a plurality of speech recognition results based on the request as hypertext protocol (HTTP) responses comprising an extensible markup language (XML) document formatted in a format that includes a status attribute indicating an overall success or failure of speech recognition on the audio input, wherein at least one of the plurality of speech recognition results and the status attribute are returned prior to performing speech recognition on all of the audio input in the request.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (API). The speech recognition request may also include additional parameters. The technique further includes performing speech recognition on the audio according to the request and any specified parameters; and returning a speech recognition result as a hypertext protocol (HTTP) response. Other embodiments are described and claimed.

11 Citations

View as Search Results

19 Claims

1. A computer-implemented method, comprising:
- accepting a speech recognition request via an application program interface (API), the request comprising an audio input and parameters including a uniform resource identifier (URI) link to a length of silence to observe, a grammar, and a grammar weight;
  
  performing speech recognition on the audio input according to the request; and
  
  upon observing the length of silence in the audio input, returning a plurality of speech recognition results based on the request as hypertext protocol (HTTP) responses comprising an extensible markup language (XML) document formatted in a format that includes a status attribute indicating an overall success or failure of speech recognition on the audio input, wherein at least one of the plurality of speech recognition results and the status attribute are returned prior to performing speech recognition on all of the audio input in the request.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the speech recognition request is formatted as at least one of:
    - an HTTP query string;
      
      an HTTP POST entity body; and
      
      at least one HTTP POST entity body part.
  - 3. The method of claim 1, wherein the speech recognition request comprises parameters including at least one of:
    - an in-line grammar;
      
      an in-line audio input;
      
      a URI link to an audio input;
      
      a timeout;
      
      a finalize timeout;
      
      a confidence level;
      
      a sensitivity level;
      
      a speed level;
      
      an accuracy level;
      
      a speaker parameter; and
      
      a recognizer specific parameter.
  - 4. The method of claim 1, wherein the request is an inline streaming request, wherein the audio input is in a raw format.
  - 5. The method of claim 1, comprising returning the speech recognition results as a streamed result.
  - 6. The method of claim 1, wherein the request is a streamed request.

7. An article comprising a storage memory unit containing instructions that when executed enable a system to:
- provide an application program interface (API) operative to receive from a computing device a speech recognition request comprising an audio input and at least two parameters including a uniform resource identifier (URI) link to a grammar, a grammar weight, and a length of silence to observe in the audio input; and
  
  return, from a speech recognition engine to the computing device, a first speech recognition result associated with the streamed speech recognition request as a first hypertext protocol (HTTP) response after the length of silence is observed and prior to returning a second speech recognition result associated with the streamed speech recognition request as a second HTTP response, the first and second HTTP responses comprising an extensible markup language (XML) document formatted in a format that includes a status flag indicating an overall success or failure of the speech recognition result for the audio input.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The article of claim 7, wherein the speech recognition request is a streamed speech recognition request.
  - 9. The article of claim 7, further comprising instructions that when executed enable the system to return speech recognition results from a request comprising at least one of an HTTP query string;
    - an HTTP POST entity body; and
      
      at least one HTTP POST entity body part.
  - 10. The article of claim 7, further comprising instructions that when executed enable the system to recognize speech from the audio input according to the speech recognition request.
  - 11. The article of claim 10, further comprising instructions that when executed enable the system to recognize speech according to at least one parameter in the speech recognition request, the at least one parameter comprising at least one of:
    - an in-line grammar;
      
      an in-line audio input;
      
      a URI link to an audio input;
      
      a timeout;
      
      a finalize timeout;
      
      a confidence level;
      
      a sensitivity level;
      
      a speed level;
      
      an accuracy level;
      
      a speaker parameter; and
      
      a recognizer specific parameter.
  - 12. The article of claim 7, further comprising instructions that when executed enable the system to accept the request as an inline streaming request, wherein the audio input is in a raw format.
  - 13. The article of claim 7, further comprising instructions that when executed enable the system to return the speech recognition results as a streamed result.

14. An apparatus, comprising:
- a processor;
  
  an application program interface (API) operative to provide an interface for building a speech recognition request comprising audio input and parameters including a uniform resource identifier (URI) link to a grammar, a grammar weight, and a specified duration of the audio input;
  
  a speech recognition (SR) component operative on the processor to receive the speech recognition request, to convert a first portion of the audio input to a first recognition result associated with the speech recognition request, to return the first recognition result after silence is observed for the specified duration of the audio input, to convert a second portion of the audio input to a second recognition result associated with the speech recognition request, and to return the second recognition result, wherein the first and second recognition results are returned as a hypertext protocol (HTTP) responses comprising an extensible markup language (XML) document formatted in a format that includes a status flag indicating an overall success or failure of the recognition result.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The apparatus of claim 14, wherein the speech recognition request is a streamed speech recognition request.
  - 16. The apparatus of claim 14, wherein the speech recognition request comprises at least one of:
    - an in-line grammar;
      
      an in-line audio input;
      
      a URI link to an audio input;
      
      a timeout;
      
      a finalize timeout;
      
      a confidence level;
      
      a sensitivity level;
      
      a speed level;
      
      an accuracy level;
      
      a speaker parameter; and
      
      a recognizer specific parameter.
  - 17. The apparatus of claim 14, wherein the API comprises a wrapper for building a speech recognition request.
  - 18. The apparatus of claim 14, wherein the SR component is operative to at least one of:
    - receive the speech recognition request as a streamed request; and
      
      return the recognition results as a streamed result.
  - 19. The apparatus of claim 16, wherein at least one of the in-line grammar or the grammar referred to by the URI link includes a reference to another grammar, and wherein the SR component is operative to retrieve said another grammar.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Chambers, Robert L., Bodell, Michael, Luong, Daphne, Wong, Annie, Gozali, Faustinus K., Ho, Andrew, Philander, Rod, Anderson, Corby
Primary Examiner(s)
Armstrong, Angela A

Application Number

US12/488,161
Publication Number

US 20100324910A1
Time in Patent Office

2,797 Days
Field of Search

704/270.1
US Class Current

1/1
CPC Class Codes

G06F 40/143   Markup, e.g. Standard Gener...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/221   Announcement of recognition...

Techniques to provide a standard interface to a speech recognition platform

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

11 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Techniques to provide a standard interface to a speech recognition platform

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links