Method and apparatus for using image data to aid voice recognition
First Claim
Patent Images
1. A computer-implemented method comprising:
- obtaining one or more images that are generated by one or more cameras of a mobile device;
analyzing one or more of the images;
identifying one or more features of an environment in which the mobile device is operating based on the analysis of one or more of the images;
determining, based on the identified one or more features, a trigger threshold against which respective values of voice commands are compared, each value of a voice command indicating a likelihood that received audio data corresponds to the voice command;
after determining the trigger threshold against which respective values voice commands are compared, receiving particular audio data;
determining that a value of a particular voice command associated with the particular audio data satisfies the trigger threshold; and
in response to determining that the value of the particular voice command associated with the particular audio data satisfies the trigger threshold, performing the particular voice command.
2 Assignments
0 Petitions
Accused Products
Abstract
A device performs a method for using image data to aid voice recognition. The method includes the device capturing image data of a vicinity of the device and adjusting, based on the image data, a set of parameters for voice recognition performed by the device. The set of parameters for the device performing voice recognition include, but are not limited to: a trigger threshold of a trigger for voice recognition; a set of beamforming parameters; a database for voice recognition; and/or an algorithm for voice recognition, wherein the algorithm can include using noise suppression or using acoustic beamforming.
52 Citations
23 Claims
-
1. A computer-implemented method comprising:
-
obtaining one or more images that are generated by one or more cameras of a mobile device; analyzing one or more of the images; identifying one or more features of an environment in which the mobile device is operating based on the analysis of one or more of the images; determining, based on the identified one or more features, a trigger threshold against which respective values of voice commands are compared, each value of a voice command indicating a likelihood that received audio data corresponds to the voice command; after determining the trigger threshold against which respective values voice commands are compared, receiving particular audio data; determining that a value of a particular voice command associated with the particular audio data satisfies the trigger threshold; and in response to determining that the value of the particular voice command associated with the particular audio data satisfies the trigger threshold, performing the particular voice command. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 23)
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining one or more images that are generated by one or more cameras of a mobile device; analyzing one or more of the images; identifying one or more features of an environment in which the mobile device is operating based on the analysis of one or more of the images; determining, based on the identified one or more features, a trigger threshold against which respective values of voice commands are compared, each value of a voice command indicating a likelihood that received audio data corresponds to the voice command; after determining the trigger threshold against which respective values of voice commands are compared, receiving particular audio data; determining that a value of a particular voice command associated with the particular audio data satisfies the trigger threshold; and in response to determining that the value of the particular voice command associated with the particular audio data satisfies the trigger threshold, performing the particular voice command. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining one or more images that are generated by one or more cameras of a mobile device; analyzing one or more of the images; identifying one or more features of an environment in which the mobile device is operating based on the analysis of one or more of the images; determining, based on the identified one or more features, a trigger threshold against which respective values of voice commands are compared, each value of a voice command indicating a likelihood that received audio data corresponds to the voice command; after determining the trigger threshold against which respective values of voice commands are compared, receiving particular audio data; determining that a value of a particular voice command associated with the particular audio data satisfies the trigger threshold; and in response to determining that the value of the particular voice command associated with the particular audio data satisfies the trigger threshold, performing the particular voice command. - View Dependent Claims (18, 19, 20, 21, 22)
-
Specification