Using visual cues to disambiguate speech inputs
First Claim
1. On a computing device, a method for recognizing a speech input, the method comprising:
- receiving image information of a physical space from a one or more cameras;
determining an identity of a user in the physical space based on the image information;
receiving audio information from one or more microphones;
determining a speech input from the audio input;
if the speech input comprises an ambiguous term, then comparing the ambiguous term in the speech input to digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term, the digital content consumption information comprising social network information obtained from a remote service, the social network information including contacts from a social network, and wherein identifying the unambiguous term comprises identifying another user from the social network information; and
after identifying the unambiguous term, taking an action on the computing device based on the speech input and the unambiguous term.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments related to recognizing speech inputs are disclosed. One disclosed embodiment provides a method for recognizing a speech input including receiving depth information of a physical space from a depth camera, determining an identity of a user in the physical space based on the depth information, receiving audio information from one or more microphones, and determining a speech input from the audio input. If the speech input comprises an ambiguous term, the ambiguous term in the speech input is compared to one or more of depth image data received from the depth image sensor and digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term. After identifying the unambiguous term, an action is taken on the computing device based on the speech input and the unambiguous term.
-
Citations
19 Claims
-
1. On a computing device, a method for recognizing a speech input, the method comprising:
-
receiving image information of a physical space from a one or more cameras; determining an identity of a user in the physical space based on the image information; receiving audio information from one or more microphones; determining a speech input from the audio input; if the speech input comprises an ambiguous term, then comparing the ambiguous term in the speech input to digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term, the digital content consumption information comprising social network information obtained from a remote service, the social network information including contacts from a social network, and wherein identifying the unambiguous term comprises identifying another user from the social network information; and after identifying the unambiguous term, taking an action on the computing device based on the speech input and the unambiguous term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. On a computing device, a method for recognizing speech of a user, comprising:
-
receiving depth information of a physical space from a depth camera; identifying one or more gestures performed by the user based on the depth information; receiving audio information from one or more microphones; determining a speech input from the audio input; if the speech input comprises an ambiguous term, then utilizing one or more of the one or more gestures and social network information obtained from a remote service to identify an unambiguous term corresponding to the ambiguous term, the social network information including contacts from a social network, and wherein identifying the unambiguous term comprises identifying another user from the social network information; and after identifying the unambiguous term, taking an action on the computing device based on the speech input and the unambiguous term. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A storage device comprising instructions executable by a logic subsystem to:
-
receive depth information of a physical space from a depth camera; determine an identity of a user in the physical space based on the depth information; identify one or more gestures performed by the user based on the depth information; receive audio information from one or more microphones; determine a speech input from the audio input; if the speech input comprises an ambiguous term, then utilize one or more of digital content consumption information for the user and the one or more gestures to identify an unambiguous term corresponding to the ambiguous term, the digital content consumption information including social network information obtained from a remote service, the social network information including contacts from a social network, and wherein identifying the unambiguous term comprises identifying another user from the social network information; and after identifying the unambiguous term, take an action on the computing device based on the speech input and the unambiguous term. - View Dependent Claims (18, 19)
-
Specification