Speech and computer vision-based control
First Claim
Patent Images
1. A computer-implemented method comprising:
- obtaining, by a computing device operable to capture images, (i) image data that describes a first scene and (ii) audio data;
identifying, by the computing device, one or more objects in the first scene based on the image data;
obtaining, by the computing device, a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene;
identifying, by the computing device based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance;
defining, by the computing device, a new rule that specifies an image capture behavior of the computing device in response to future instances of identification of the first object in future image data that is different than the current image data; and
controlling, by the computing device, a future operation of the computing device to comply with the new rule.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure relates to a method for controlling a digital photography system. The method includes obtaining, by a device, image data and audio data. The method also includes identifying one or more objects in the image data and obtaining a transcription of the audio data. The method also includes controlling a future operation of the device based at least on the one or more objects identified in the image data, and the transcription of the audio data.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining, by a computing device operable to capture images, (i) image data that describes a first scene and (ii) audio data; identifying, by the computing device, one or more objects in the first scene based on the image data; obtaining, by the computing device, a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene; identifying, by the computing device based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance; defining, by the computing device, a new rule that specifies an image capture behavior of the computing device in response to future instances of identification of the first object in future image data that is different than the current image data; and controlling, by the computing device, a future operation of the computing device to comply with the new rule. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining, by the one or more computers, (i) image data that describes a first scene and (ii) audio data; identifying one or more objects in the first scene based on the image data; obtaining a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene; identifying, based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance; defining a new rule that specifies an image capture behavior of at least one of the one or more computers in response to future instances of identification of the first object in future image data that is different than the current image data; and controlling a future operation of the at least one of the one or more computers to comply with the new rule. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
20. A non-transitory, computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining (i) image data that describes a first scene and (ii) audio data; identifying one or more objects in the first scene based on the image data; obtaining a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene; identifying, based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance; defining a new rule that specifies an image capture behavior of the one or more computers in response to future instances of identification of the first object in future image data that is different than the current image data; and controlling a future operation of the one or more computers to comply with the new rule.
-
Specification