System and method of providing speech processing in user interface

US 9,177,551 B2
Filed: 05/28/2008
Issued: 11/03/2015
Est. Priority Date: 01/22/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, via touch provided on a touch screen of a device, an indication associated with a specific field displayed in a user interface on the touch screen, the indication signaling that speech, which is associated with the specific field, will follow;

receiving the speech via the device and generating speech data based on the speech;

generating, by the device, a request for speech recognition, wherein the request comprises;

(1) an application identifier identifying a speech recognizer on a public network node;

(2) a location parameter specific to a current location of the device, the device being associated with a speaker of the speech; and

(3) a grammar parameter associated with a home location of the speaker of the speech, the grammar parameter identifying a particular grammar;

transmitting the speech data and the request to the public network node for speech recognition using the speech recognizer;

receiving, at the device, text associated with the speech data from the speech recognizer; and

inserting the text into the specific field.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are systems, methods and computer-readable media for enabling speech processing in a user interface of a device. The method includes receiving an indication of a field and a user interface of a device, the indication also signaling that speech will follow, receiving the speech from the user at the device, the speech being associated with the field, transmitting the speech as a request to public, common network node that receives and processes speech, processing the transmitted speech and returning text associated with the speech to the device and inserting the text into the field. Upon a second indication from the user, the system processes the text in the field as programmed by the user interface. The present disclosure provides a speech mash up application for a user interface of a mobile or desktop device that does not require expensive speech processing technologies.

45 Citations

View as Search Results

23 Claims

1. A method comprising:
- receiving, via touch provided on a touch screen of a device, an indication associated with a specific field displayed in a user interface on the touch screen, the indication signaling that speech, which is associated with the specific field, will follow;
  
  receiving the speech via the device and generating speech data based on the speech;
  
  generating, by the device, a request for speech recognition, wherein the request comprises;
  
  (1) an application identifier identifying a speech recognizer on a public network node;
  
  (2) a location parameter specific to a current location of the device, the device being associated with a speaker of the speech; and
  
  (3) a grammar parameter associated with a home location of the speaker of the speech, the grammar parameter identifying a particular grammar;
  
  transmitting the speech data and the request to the public network node for speech recognition using the speech recognizer;
  
  receiving, at the device, text associated with the speech data from the speech recognizer; and
  
  inserting the text into the specific field.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising, upon a second indication from a user, processing the text in the specific field as programmed by the user interface.
  - 3. The method of claim 1, further comprising receiving an instruction with the text that causes the user interface to process the text without further user input.
  - 4. The method of claim 3, wherein the instruction is only received upon recognition from the speech recognizer exceeding a threshold.
  - 5. The method of claim 1, further comprising, after receiving the speech, receiving a second indication from a user that the speech intended for the specific field has ceased.
  - 6. The method of claim 2, wherein processing the text in the specific field is performed as though the user typed the text in the specific field.
  - 7. The method of claim 1, wherein transmitting the speech data and the request to the public network node is performed using one of a representational state transfer protocol, a simple object access protocol, and a web-based protocol.
  - 8. The method of claim 4, wherein the public network node determines the particular grammar for the speech recognizer to use in recognizing the speech based on the location parameter and the grammar parameter.
  - 9. The method of claim 1, wherein the application identifier is only released to registered users.
  - 10. The method of claim 9, wherein the grammar parameter controls a compilation of a plurality of grammars.
  - 11. The method of claim 9, wherein the control string controls one of:
    - coding, a byte order, a sampling rate and n-best results.
  - 12. The method of claim 10, wherein a compile grammar string comprises a pointer to a network location of the particular grammar for the speech recognizer to use in recognizing the speech data.
  - 13. The method of claim 1, further comprising presenting an action button associated with the text in the specific field only when a confidence level from the speech recognizer is below a threshold.
  - 14. The method of claim 1, wherein when the speech recognizer returns multiple possible interpretations of the speech data, inserting each possible interpretation into a separate text field with an indication instructing a user to select which text field to process.

15. A device comprising:
- a touch screen;
  
  a processor; and
  
  a computer-readable medium storing instructions which, when executed by the processor, cause the processor to perform operations comprising;
  
  receiving, via touch provided on the touch screen, an indication associated with a specific field displayed in a user interface on the touch screen, the indication signaling that speech, which is associated with the specific field, will follow;
  
  receiving the speech via the device and generating speech data based on the speech;
  
  generating, by the device, a request for speech recognition, wherein the request comprises;
  
  (1) an application identifier identifying a speech recognizer on a public network node;
  
  (2) a location parameter specific to a current location of the device, the device being associated with a speaker of the speech; and
  
  (3) a grammar parameter associated with a home location of the speaker of the speech, the grammar parameter identifying a particular grammar;
  
  transmitting the speech data and the request to the public network node for speech recognition using the speech recognizer;
  
  receiving, at the device, text associated with the speech data from the speech recognizer; and
  
  inserting the text into the specific field.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
- - 16. The device of claim 15, the computer-readable storage medium having additional instructions stored which result in operations comprising:
    - upon receiving a second indication from a user, processing the text in the specific field as programmed by the user interface.
  - 17. The device of claim 15, the computer-readable storage medium having additional instructions stored which result in operations comprising:
    - processing the text without further user input.
  - 18. The device of claim 15, wherein the indication is only received with the text upon speech recognition exceeding a recognition threshold.
  - 19. The device of claim 15, wherein the request is transmitted in a hypertext transfer protocol.
  - 20. The device of claim 15, wherein the application identifier is only released to registered users.
  - 21. The device of claim 15, the computer-readable storage medium having additional instructions stored which result in operations comprising:
    - presenting an action button associated with the text inserted into the specific field only when a confidence level from the speech recognizer is below a threshold.
  - 22. The device of claim 15, the computer-readable storage medium having additional instructions stored which result in operations comprising:
    - presenting two possible interpretations in separate text fields when the speech recognizer returns multiple possible interpretations of the speech data and presenting an indication instructing a user to select which text field to process.

23. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving, via touch provided on a touch screen of a device, an indication associated with a specific field displayed in a user interface on the touch screen, the indication signaling that speech, which is associated with the specific field, will follow;
  
  receiving the speech via the device and generating speech data based on the speech;
  
  generating, by the device, a request for speech recognition, wherein the request comprises;
  
  (1) an application identifier identifying a speech recognizer on a public network node;
  
  (2) a location parameter specific to a current location of the device, the device being associated with a speaker of the speech; and
  
  (3) a grammar parameter associated with a home location of the speaker of the speech, the grammar parameter identifying a particular grammar;
  
  transmitting the speech data and the request to the public network node for speech recognition using the speech recognizer;
  
  receiving, at the device, text associated with the speech data from the speech recognizer; and
  
  inserting the text into the specific field.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Wilpon, Jay, Di Fabbrizio, Giuseppe, Stern, Benjamin J.
Primary Examiner(s)
Serrou, Abdelali

Application Number

US12/128,345
Publication Number

US 20090187410A1
Time in Patent Office

2,715 Days
Field of Search

704/251, 704/235, 704/75, 704/270.1, 704/E15.041, 704/275
US Class Current

1/1
CPC Class Codes

G06F 3/0416   Control or interface arrang...

G06F 3/162   Interface to dedicated audi...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

System and method of providing speech processing in user interface

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of providing speech processing in user interface

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links