Device voice control for selecting a displayed affordance
First Claim
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions for voice control of displayed content, which when executed by one or more processors of an electronic device, cause the electronic device to:
- receive a first spoken user input;
obtain a first text string based on the first spoken user input;
derive a representation of a first user intent based on the first text string, wherein the first user intent is derived based on a degree of match between the first text string and one or more words associated with a first predefined domain;
determine whether a task associated with one or more displayed affordances may be identified based on the representation of the first user intent based on the first text string;
in accordance with a determination that a task may be identified based on the representation of the first user intent based on the first text string, perform the task associated with the one or more displayed affordances; and
in accordance with a determination that a task may not be identified based on the representation of the first user intent;
highlight one or more of the displayed affordances;
receive a second spoken user input corresponding to an affordance of the one or more affordances;
obtain a second text string based on the second spoken user input;
derive a representation of a second user intent based on the second text string, wherein the second user intent is derived based on a degree of match between the second text string and one or more words associated with a second predefined domain;
determine whether a task may be identified based on the representation of the second user intent based on the second text string; and
in accordance with a determination that a task may be identified based on the representation of the second user intent based on the second text string, select the affordance of the one or more affordances.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for device voice control are provided. An example process includes, at an electronic device, receiving a spoken user input and interpreting the spoken user input to derive a representation of user intent. The process further includes determining whether a task may be identified based on the representation of user intent. In accordance with a determination that a task may be identified based on the representation of user intent, the task is performed, and in accordance with a determination that a task may not be identified based on the representation of user intent, the spoken user input is disambiguated.
3587 Citations
42 Claims
-
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions for voice control of displayed content, which when executed by one or more processors of an electronic device, cause the electronic device to:
-
receive a first spoken user input; obtain a first text string based on the first spoken user input; derive a representation of a first user intent based on the first text string, wherein the first user intent is derived based on a degree of match between the first text string and one or more words associated with a first predefined domain; determine whether a task associated with one or more displayed affordances may be identified based on the representation of the first user intent based on the first text string; in accordance with a determination that a task may be identified based on the representation of the first user intent based on the first text string, perform the task associated with the one or more displayed affordances; and in accordance with a determination that a task may not be identified based on the representation of the first user intent; highlight one or more of the displayed affordances; receive a second spoken user input corresponding to an affordance of the one or more affordances; obtain a second text string based on the second spoken user input; derive a representation of a second user intent based on the second text string, wherein the second user intent is derived based on a degree of match between the second text string and one or more words associated with a second predefined domain; determine whether a task may be identified based on the representation of the second user intent based on the second text string; and in accordance with a determination that a task may be identified based on the representation of the second user intent based on the second text string, select the affordance of the one or more affordances. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for voice control of displayed content, comprising:
at an electronic device; receiving a first spoken user input; obtaining a first text string based on the first spoken user input; deriving a representation of a first user intent based on the first text string, wherein the first user intent is derived based on a degree of match between the first text string and one or more words associated with a first predefined domain; determining whether a task associated with one or more displayed affordances may be identified based on the representation of the first user intent based on the first text string; in accordance with a determination that a task may be identified based on the representation of the first user intent based on the first text string, performing the task associated with the one or more displayed affordances; and in accordance with a determination that a task may not be identified based on the representation of the first user intent; highlighting one or more of the displayed affordances; receiving a second spoken user input corresponding to an affordance of the one or more affordances; obtaining a second text string based on the second spoken user input; derive a representation of a second user intent based on the second text string, wherein the second user intent is derived based on a degree of match between the second text string and one or more words associated with a second predefined domain; determining whether a task may be identified based on the representation of the second user intent based on the second text string; and in accordance with a determination that a task may be identified based on the representation of the second user intent based on the second text string, selecting the affordance of the one or more affordances. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
29. An electronic device, comprising:
-
one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for voice control of displayed content, the instructions for; receiving a first spoken user input; obtaining a first text string based on the first spoken user input; deriving a representation of a first user intent based on the first text string, wherein the first user intent is derived based on a degree of match between the first text string and one or more words associated with a first predefined domain; determining whether a task associated with one or more displayed affordances may be identified based on the representation of the first user intent based on the first text string; in accordance with a determination that a task may be identified based on the representation of the first user intent based on the first text string, performing the task associated with the one or more displayed affordances; and in accordance with a determination that a task may not be identified based on the representation of the first user intent; highlighting one or more of the displayed affordances; receiving a second spoken user input corresponding to an affordance of the one or more affordances; obtaining a second text string based on the second spoken user input; derive a representation of a second user intent based on the second text string, wherein the second user intent is derived based on a degree of match between the second text string and one or more words associated with a second predefined domain; determining whether a task may be identified based on the representation of the second user intent based on the second text string; and in accordance with a determination that a task may be identified based on the representation of the second user intent based on the second text string, selecting the affordance of the one or more affordances. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
Specification