Systems and methods for providing a voice agent user interface
First Claim
Patent Images
1. A computing device, comprising:
- at least one storage device configured to store a plurality of application programs, the plurality of application programs comprising a first application program that provides access to a web-based service; and
at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to;
receive voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
process the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
process, using a natural language understanding (NLU) engine, the text of the recognized speech to;
determine a meaning of the text of the recognized speech;
determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
identify based, at least in part, on the determined meaning of the text of the recognized speech, which of the plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and
display a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program.
3 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; identifying at least one application program as relating to the received voice input; and displaying at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the at least one application program identified as relating to the received voice input.
38 Citations
27 Claims
-
1. A computing device, comprising:
-
at least one storage device configured to store a plurality of application programs, the plurality of application programs comprising a first application program that provides access to a web-based service; and at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to; receive voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action; process the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text; process, using a natural language understanding (NLU) engine, the text of the recognized speech to; determine a meaning of the text of the recognized speech; determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and identify based, at least in part, on the determined meaning of the text of the recognized speech, which of the plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and display a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method performed by at least one voice agent, the method comprising:
-
receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action; processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text; processing, using a natural language understanding (NLU) engine, the text of the recognized speech to; determine a meaning of the text of the recognized speech; determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and displaying a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. At least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by at least one computing device, cause the at least one computing device to implement at least one voice agent that performs a method comprising:
-
receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action; processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text; processing, using a natural language understanding (NLU) engine, the text of the recognized speech to; determine a meaning of the text of the recognized speech; determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and displaying a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computing device, comprising:
at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to; receive voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action; process the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text; process, using a natural language understanding (NLU) engine, the text of the recognized speech to; determine a meaning of the text of the recognized speech; determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and identify based, at least in part, on the determined meaning of the text of the recognized speech, which of at least one application program is configured to at least partially perform the at least one action specified in the voice input; and display at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the selected application program of the at least one application program identified by the NLU engine as relating to the received voice input. - View Dependent Claims (20, 21)
-
22. A method performed by at least one voice agent, the method comprising:
-
receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action; processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text; processing, using a natural language understanding (NLU) engine, the text of the recognized speech to; determine a meaning of the text of the recognized speech; determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs is configured to at least partially perform the at least one action specified in the voice input; and displaying at least one selectable visual representation that, when selected, causes focus of a computing device programmed to implement the at least one voice agent to be directed to the at least one application program identified by the NLU engine as relating to the received voice input. - View Dependent Claims (23, 24)
-
-
25. At least one non-transitory computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to implement at least one voice agent that performs a method comprising:
-
receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action; processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text; processing, using a natural language understanding (NLU) engine, the text of the recognized speech to; determine a meaning of the text of the recognized speech; determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs is configured to at least partially perform the at least one action specified in the voice input; and displaying at least one selectable visual representation that, when selected, causes focus of a computing device programmed to implement the at least one voice agent to be directed to one of the plurality of application programs identified by the NLU engine as relating to the received voice input. - View Dependent Claims (26, 27)
-
Specification