Systems and methods for providing a voice agent user interface

US 10,276,157 B2
Filed: 10/01/2012
Issued: 04/30/2019
Est. Priority Date: 10/01/2012
Status: Active Grant

First Claim

Patent Images

1. A computing device, comprising:

at least one storage device configured to store a plurality of application programs, the plurality of application programs comprising a first application program that provides access to a web-based service; and

at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to;

receive voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;

process the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;

process, using a natural language understanding (NLU) engine, the text of the recognized speech to;

determine a meaning of the text of the recognized speech;

determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and

identify based, at least in part, on the determined meaning of the text of the recognized speech, which of the plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and

display a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; identifying at least one application program as relating to the received voice input; and displaying at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the at least one application program identified as relating to the received voice input.

38 Citations

View as Search Results

27 Claims

1. A computing device, comprising:
- at least one storage device configured to store a plurality of application programs, the plurality of application programs comprising a first application program that provides access to a web-based service; and
  
  at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to;
  
  receive voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
  
  process the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
  
  process, using a natural language understanding (NLU) engine, the text of the recognized speech to;
  
  determine a meaning of the text of the recognized speech;
  
  determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
  
  identify based, at least in part, on the determined meaning of the text of the recognized speech, which of the plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and
  
  display a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computing device of claim 1, wherein the first application program is a web browser application program.
  - 3. The computing device of claim 1, wherein the voice input specifies the web-based service.
  - 4. The computing device of claim 1, wherein the web-based service is accessible by a plurality of users each having an account with the web-based service, wherein the plurality of users includes a first user of the computing device, wherein the first user has a first account with the web-based service, and wherein the first application program provides access to the web-based service, at least in part, by using information associated with the first user'"'"'s first account.
  - 5. The computing device of claim 1, wherein the first application program is dedicated to providing access to the web-based service.
  - 6. The computing device of claim 1, wherein the first selectable visual representation comprises a selectable icon associated with the first application program, wherein the selectable icon is configured to be selected in response to being tapped and/or clicked by a user.

7. A method performed by at least one voice agent, the method comprising:
- receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
  
  processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
  
  processing, using a natural language understanding (NLU) engine, the text of the recognized speech to;
  
  determine a meaning of the text of the recognized speech;
  
  determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
  
  identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and
  
  displaying a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, wherein the first application program is a web browser application program.
  - 9. The method of claim 7, wherein the voice input specifies a web-based service.
  - 10. The method of claim 9, wherein the web-based service is accessible by a plurality of users each having an account with the web-based service, wherein the plurality of users includes a first user of the computing device, wherein the first user has a first account with the web-based service, and wherein the first application program provides access to the service, at least in part, by using information associated with the first user'"'"'s first account.
  - 11. The method of claim 7, wherein the first application program is dedicated to providing access to a web-based service.
  - 12. The method of claim 7, wherein the first selectable visual representation comprises a selectable icon associated with the first application program, wherein the selectable icon is configured to be selected in response to being tapped and/or clicked by a user.

13. At least one non-transitory computer-readable storage medium storing processor executable instructions that, when executed by at least one computing device, cause the at least one computing device to implement at least one voice agent that performs a method comprising:
- receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
  
  processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
  
  processing, using a natural language understanding (NLU) engine, the text of the recognized speech to;
  
  determine a meaning of the text of the recognized speech;
  
  determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
  
  identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs are configured to at least partially perform the at least one action specified in the voice input; and
  
  displaying a plurality of selectable visual representations corresponding to the plurality of application programs identified by the NLU engine, the plurality of selectable visual representations including a first selectable visual representation corresponding to the first application program, wherein the first selectable visual representation, when selected, causes focus of the computing device to be directed to the first application program.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The at least one non-transitory computer-readable storage medium of claim 13, wherein the first application program is a web-browser application program.
  - 15. The at least one non-transitory computer-readable storage medium of claim 13, wherein the voice input specifies a web-based service.
  - 16. The at least one non-transitory computer-readable storage medium of claim 15, wherein the web-based service is accessible by a plurality of users each having an account with the web-based service, wherein the plurality of users includes a first user of the computing device, wherein the first user has a first account with the web-based service, and wherein the first application program provides access to the service, at least in part, by using information associated with the first user'"'"'s first account.
  - 17. The at least one non-transitory computer-readable storage medium of claim 13, wherein the first application program is dedicated to providing access to a web-based service.
  - 18. The at least one non-transitory computer-readable storage medium of claim 13, wherein the first selectable visual representation comprises a selectable icon associated with the first application program, wherein the selectable icon is configured to be selected in response to being tapped and/or clicked by a user.

19. A computing device, comprising:
- at least one processor programmed to implement at least one voice agent, wherein the at least one voice agent is configured to;
  
  receive voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
  
  process the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
  
  process, using a natural language understanding (NLU) engine, the text of the recognized speech to;
  
  determine a meaning of the text of the recognized speech;
  
  determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
  
  identify based, at least in part, on the determined meaning of the text of the recognized speech, which of at least one application program is configured to at least partially perform the at least one action specified in the voice input; and
  
  display at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the selected application program of the at least one application program identified by the NLU engine as relating to the received voice input.
- View Dependent Claims (20, 21)
- - 20. The computing device of claim 19, wherein the at least one application program comprises a first application that provides access to a web-based service for streaming video content and/or streaming audio content.
  - 21. The computing device of claim 19, wherein the at least one application program comprises a plurality of application programs, and wherein the at least one voice agent is configured to display a plurality of selectable visual representations, each one of the plurality of selectable visual representations corresponding to one of the plurality of application programs.

22. A method performed by at least one voice agent, the method comprising:
- receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
  
  processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
  
  processing, using a natural language understanding (NLU) engine, the text of the recognized speech to;
  
  determine a meaning of the text of the recognized speech;
  
  determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
  
  identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs is configured to at least partially perform the at least one action specified in the voice input; and
  
  displaying at least one selectable visual representation that, when selected, causes focus of a computing device programmed to implement the at least one voice agent to be directed to the at least one application program identified by the NLU engine as relating to the received voice input.
- View Dependent Claims (23, 24)
- - 23. The method of claim 22, wherein the at least one application program comprises a first application that provides access to a web-based service for streaming video content and/or streaming audio content.
  - 24. The method of claim 22, wherein the at least one application program comprises a plurality of application programs, and wherein the at least one voice agent is configured to display a plurality of selectable visual representations, each one of the plurality of selectable visual representations corresponding to one of the plurality of application programs.

25. At least one non-transitory computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to implement at least one voice agent that performs a method comprising:
- receiving voice input that specifies at least one action to be performed without explicitly identifying an application program to perform the at least one action;
  
  processing the voice input using an automatic speech recognition (ASR) engine to generate recognized speech comprising text;
  
  processing, using a natural language understanding (NLU) engine, the text of the recognized speech to;
  
  determine a meaning of the text of the recognized speech;
  
  determine based, at least in part, on the determined meaning of the text of the recognized speech, the at least one action specified in the voice input; and
  
  identify based, at least in part, on the determined meaning of the text of the recognized speech, which of a plurality of application programs is configured to at least partially perform the at least one action specified in the voice input; and
  
  displaying at least one selectable visual representation that, when selected, causes focus of a computing device programmed to implement the at least one voice agent to be directed to one of the plurality of application programs identified by the NLU engine as relating to the received voice input.
- View Dependent Claims (26, 27)
- - 26. The at least one non-transitory computer-readable storage medium of claim 25, wherein the plurality of application programs comprises a first application that provides access to a web-based service for streaming video content and/or streaming audio content.
  - 27. The at least one non-transitory computer-readable storage medium of claim 25, wherein the at least one voice agent is configured to display a plurality of selectable visual representations, each one of the plurality of selectable visual representations corresponding to one of the plurality of application programs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Lynch, Timothy, Brown, Sean P., Attayadmawittaya, Paweena, Cabaco, Tiago Goncalves, Chen, Victor Shine
Primary Examiner(s)
Tzeng, Feng-Tzer

Application Number

US13/632,344
Publication Number

US 20140095173A1
Time in Patent Office

2,402 Days
Field of Search

704246, 704251, 704270, 704275
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/04   Segmentation; Word boundary...

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/22   Interactive procedures; Man...

G10L 2015/228   of application context

Systems and methods for providing a voice agent user interface

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

38 Citations

27 Claims

Specification

Use Cases

Quick Links

Others

Systems and methods for providing a voice agent user interface

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

27 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others