Visual confirmation for a recognized voice-initiated action

US 9,575,720 B2
Filed: 12/17/2013
Issued: 02/21/2017
Est. Priority Date: 07/31/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

outputting, by a first application executing at a computing device and for display, a speech recognition graphical user interface (GUI) having at least one non-textual element in a first visual format;

receiving, by the first application executing at the computing device, first audio data of a voice command that indicates one or more words of the voice command;

determining, by the first application executing at the computing device, based on the one or more words of the voice command, a voice-initiated action indicated by the first audio data of the voice command, wherein the voice-initiated action is a particular voice-initiated action from a plurality of voice-initiated actions and the voice-initiated action is associated with a second application that is different than the first application;

responsive to determining the voice-initiated action indicated by the first audio data of the voice command, and while receiving second audio data of the voice command that indicates one or more additional words of the voice command, and prior to executing the second application to perform the voice command, outputting, by the first application executing at the computing device, for display, an updated speech recognition GUI in which the at least one non-textual element, from the speech recognition GUI, transitions from being displayed in the first visual format to being displayed in a second visual format, different from the first visual format, indicating that the voice-initiated action is the particular voice-initiated action from the plurality of voice-initiated actions that has been determined from the first audio data of the voice command, wherein;

the first visual format of the at least one non-textual element is a first image representative of a speech recognition mode of the first application,the second visual format of the at least one non-textual element is a second image that replaces the first image and corresponds to the voice-initiated action from the plurality of voice-initiated actions, andthe second image is different from other images corresponding to one or more other voice-initiated actions from the plurality of voice-initiated actions; and

after outputting the updated speech recognition GUI and after receiving the second audio data of the voice command, executing, by the computing device, based on the first audio data and the second audio data, the second application that performs the voice-initiated action indicated by the voice command.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques described herein provide a computing device configured to provide an indication that the computing device has recognized a voice-initiated action. In one example, a method is provided for outputting, by a computing device and for display, a speech recognition graphical user interface (GUI) having at least one element in a first visual format. The method further includes receiving, by the computing device, audio data and determining, by the computing device, a voice-initiated action based on the audio data. The method also includes outputting, while receiving additional audio data and prior to executing a voice-initiated action based on the audio data, and for display, an updated speech recognition GUI in which the at least one element is displayed in a second visual format, different from the first visual format, to indicate that the voice-initiated action has been identified.

49 Citations

View as Search Results

20 Claims

1. A method comprising:
- outputting, by a first application executing at a computing device and for display, a speech recognition graphical user interface (GUI) having at least one non-textual element in a first visual format;
  
  receiving, by the first application executing at the computing device, first audio data of a voice command that indicates one or more words of the voice command;
  
  determining, by the first application executing at the computing device, based on the one or more words of the voice command, a voice-initiated action indicated by the first audio data of the voice command, wherein the voice-initiated action is a particular voice-initiated action from a plurality of voice-initiated actions and the voice-initiated action is associated with a second application that is different than the first application;
  
  responsive to determining the voice-initiated action indicated by the first audio data of the voice command, and while receiving second audio data of the voice command that indicates one or more additional words of the voice command, and prior to executing the second application to perform the voice command, outputting, by the first application executing at the computing device, for display, an updated speech recognition GUI in which the at least one non-textual element, from the speech recognition GUI, transitions from being displayed in the first visual format to being displayed in a second visual format, different from the first visual format, indicating that the voice-initiated action is the particular voice-initiated action from the plurality of voice-initiated actions that has been determined from the first audio data of the voice command, wherein;
  
  the first visual format of the at least one non-textual element is a first image representative of a speech recognition mode of the first application,the second visual format of the at least one non-textual element is a second image that replaces the first image and corresponds to the voice-initiated action from the plurality of voice-initiated actions, andthe second image is different from other images corresponding to one or more other voice-initiated actions from the plurality of voice-initiated actions; and
  
  after outputting the updated speech recognition GUI and after receiving the second audio data of the voice command, executing, by the computing device, based on the first audio data and the second audio data, the second application that performs the voice-initiated action indicated by the voice command.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, further comprising:
    - determining, by the computing device, based on the first audio data and the second audio data, a transcription comprising the one or more words of the voice command and the one or more additional words of the voice command,wherein outputting the updated speech recognition GUI comprises outputting at least a portion of the transcription.
  - 3. The method of claim 2,wherein outputting the updated speech recognition GUI further comprises outputting the one or more words of the voice command and refraining from outputting the one or more additional words of the voice command.
  - 4. The method of claim 1, wherein the second visual format is further different from the first visual format in at least one of color, font, size, highlighting, style, or position.
  - 5. The method of claim 1, wherein outputting the updated speech recognition GUI comprises outputting the first image representative of the speech recognition mode with an animation that morphs into the second image in response to determining the voice-initiated action based on the first audio data.
  - 6. The method of claim 1, further comprising responsive to determining the voice-initiated action based on the first audio data, performing, by the computing device, based on the second audio data, the voice-initiated action.
  - 7. The method of claim 6, wherein the voice-initiated action is performed in response to receiving, by the computing device, an indication confirming that the voice-initiated action is correct.
  - 8. The method of claim 1, wherein determining the voice-initiated action further comprises determining the voice-initiated action based at least partially on a comparison of at least one of the one or more words of the voice command to a preconfigured set of actions.
  - 9. The method of claim 1, wherein determining the voice-initiated action further comprises:
    - identifying, by the computing device, at least one verb in the one or more words of the voice command; and
      
      comparing the at least one verb to one or more verbs from a set of verbs, each verb in the set of verbs corresponding to at least one action from the plurality of voice-initiated actions.
  - 10. The method of claim 1, wherein determining the voice-initiated action further comprises:
    - determining, by the computing device, a context based at least part on data from the computing device; and
      
      determining, by the computing device, based at least partially on the context and the first audio data, the voice-initiated action.
  - 11. The method of claim 1, further comprising:
    - responsive to receiving an indication of a cancellation input, outputting, by the computing device, the at least one non-textual element for display in the first visual format.
  - 12. The method of claim 1, wherein the first image representative of a speech recognition mode of the first application comprises a microphone.
  - 13. The method of claim 1, wherein the second image is selected from a group consisting of:
    - a compass arrow associated with a navigation feature of the second application, a play button associated with a media output feature of the second application, a pause button associated with the media output feature of the second application, a stop button associated with the media output feature of the second application, a telephone button associated with a telephone feature of the second application, and a search engine icon associated with a search feature of the second application.
  - 14. The method of claim 1, wherein:
    - the at least one non-textual element is displayed within a particular region of a display while being output for display in the first visual format; and
      
      the at least one non-textual element is displayed within the particular region of the display while being output for display in the second visual format.
  - 15. The method of claim 1, wherein outputting the updated speech recognition GUI comprises:
    - prior to outputting the at least one non-textual element in the second visual format, outputting, by the first application executing at the computing device, for display, an animation of the at least one non-textual element transitioning from the first visual format to the second visual format.
  - 16. The method of claim 1, wherein the first audio data is associated with command speech from a user of the computing device and the second audio data is associated with non-command speech from the user.

17. A computing device, comprising:
- a display device;
  
  one or more processors; and
  
  a memory that stores instructions associated with a first application that when executed cause the one or more processors to;
  
  output, for display at the display device, a speech recognition graphical user interface (GUI) having at least one non-textual element in a first visual format;
  
  receive first audio data of a voice command that indicates one or more words of the voice command;
  
  determine, based on the one or more words of the voice command, a voice-initiated action indicated by the first audio data of the voice command, wherein the voice-initiated action is a particular voice-initiated action from a plurality of voice-initiated actions and the voice-initiated action is associated with a second application that is different than the first application;
  
  responsive to determining the voice-initiated action indicated by the first audio data of the voice command, and while receiving second audio data of the voice command that indicates one or more additional words of the voice command, and prior to executing the second application to perform the voice command, output, for display at the display device, an updated speech recognition GUI in which the at least one non-textual element, from the speech recognition GUI, transitions from being displayed in the first visual format to being displayed in a second visual format, different from the first visual format, indicating that the voice-initiated action is the particular voice-initiated action from the plurality of voice-initiated action that has been determined from the first audio data of the voice command, wherein;
  
  the first visual format of the at least one non-textual element is a first image representative of a speech recognition mode of the first application,the second visual format of the at least one non-textual element is a second image that replaces the first image and corresponds to the voice-initiated action from the plurality of voice-initiated actions, andthe second image is different from other images corresponding to one or more other voice-initiated actions from the plurality of voice-initiated actions; and
  
  after outputting the updated speech recognition GUI and after receiving the second audio data of the voice command, execute, based on the first audio data and the second audio data, the second application that performs the voice-initiated action indicated by the voice command.
- View Dependent Claims (18)
- - 18. The computing device of claim 17, wherein the instructions associated with the first application, when executed, further cause the one or more processors to:
    - determine, based on the first audio data and the second audio data, a transcription comprising the one or more words of the voice command and the one or more additional words of the voice command; and
      
      output, for display, the updated speech recognition GUI by at least outputting at least a portion of the transcription that excludes the one or more words.

19. A non-transitory computer-readable storage medium encoded with instructions associated with a first application that, when executed, cause one or more processors of a computing device to:
- output, for display at the display device, a speech recognition graphical user interface (GUI) having at least one non-textual element in a first visual format;
  
  receive first audio data of a voice command that indicates one or more words of the voice command;
  
  determine, based on the one or more words of the voice command, a voice-initiated action indicated by the first audio data of the voice command, wherein the voice-initiated action is a particular voice-initiated action from a plurality of voice-initiated actions and the voice-initiated action is associated with a second application that is different than the first application;
  
  responsive to determining the voice-initiated action indicated by the first audio data of the voice command, and while receiving second audio data of the voice command that indicates one or more additional words of the voice command, and prior to executing the second application to perform the voice command, output, for display at the display device, an updated speech recognition GUI in which the at least one non-textual element, from the speech recognition GUI, transitions from being displayed in the first visual format to being displayed in a second visual format, different from the first visual format, indicating that the voice-initiated action is the particular voice-initiated action from the plurality of voice-initiated action that has been determined from the first audio data of the voice command, wherein;
  
  the first visual format of the at least one non-textual element is a first image representative of a speech recognition mode of the first application,the second visual format of the at least one non-textual element is a second image that replaces the first image and corresponds to the voice-initiated action from the plurality of voice-initiated actions, andthe second image is different from other images corresponding to one or more other voice-initiated actions from the plurality of voice-initiated actions; and
  
  after outputting the updated speech recognition GUI and after receiving the second audio data of the voice command, execute, based on the first audio data and the second audio data, a second application that performs the voice-initiated action indicated by the voice command.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable storage medium of claim 19, wherein the instructions, when executed, further cause the one or more processors of the computing device to:
    - determine, based on the first audio data and the second audio data, a transcription comprising the one or more words of the voice command and the one or more additional words of the voice command; and
      
      output, for display, the updated speech recognition GUI by at least outputting at least a portion of the transcription that excludes the one or more words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Faaborg, Alexander, Ng, Peter
Primary Examiner(s)
Ng, Amy
Assistant Examiner(s)
Kim, Sang H

Application Number

US14/109,660
Publication Number

US 20150040012A1
Time in Patent Office

1,162 Days
Field of Search

715/728, 715/716, 704/275, 704/E15.04, 704/E15.001, 704/270
US Class Current

1/1
CPC Class Codes

G01C 21/3608   using speech input, e.g. us...

G06F 3/04817   using icons graphical or vi...

G06F 3/167   Audio in a user interface, ...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Visual confirmation for a recognized voice-initiated action

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Visual confirmation for a recognized voice-initiated action

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links