Speech and computer vision-based control

US 9,769,367 B2
Filed: 02/19/2016
Issued: 09/19/2017
Est. Priority Date: 08/07/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

obtaining, by a computing device operable to capture images, (i) image data that describes a first scene and (ii) audio data;

identifying, by the computing device, one or more objects in the first scene based on the image data;

obtaining, by the computing device, a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene;

identifying, by the computing device based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance;

defining, by the computing device, a new rule that specifies an image capture behavior of the computing device in response to future instances of identification of the first object in future image data that is different than the current image data; and

controlling, by the computing device, a future operation of the computing device to comply with the new rule.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure relates to a method for controlling a digital photography system. The method includes obtaining, by a device, image data and audio data. The method also includes identifying one or more objects in the image data and obtaining a transcription of the audio data. The method also includes controlling a future operation of the device based at least on the one or more objects identified in the image data, and the transcription of the audio data.

Citations

20 Claims

1. A computer-implemented method comprising:
- obtaining, by a computing device operable to capture images, (i) image data that describes a first scene and (ii) audio data;
  
  identifying, by the computing device, one or more objects in the first scene based on the image data;
  
  obtaining, by the computing device, a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene;
  
  identifying, by the computing device based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance;
  
  defining, by the computing device, a new rule that specifies an image capture behavior of the computing device in response to future instances of identification of the first object in future image data that is different than the current image data; and
  
  controlling, by the computing device, a future operation of the computing device to comply with the new rule.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein controlling the future operation of the computing device comprises determining whether to store the future image data.
  - 3. The method of claim 1, wherein controlling the future operation of the computing device comprises determining whether to automatically upload future generated image data to cloud storage.
  - 4. The method of claim 1, wherein identifying, by the computing device, one or more objects in the first scene comprises at least one of identifying a person using face detection, identifying a gesture performed by a person in the image, or detecting an action performed by a person in the image.
  - 5. The method of claim 1, further comprising generating, by the computing device, the image data and the audio data.
  - 6. The method of claim 1, wherein the computing device is a camera.
  - 7. The method of claim 1, wherein the transcription of the audio data is obtained using automated speech recognition.
  - 8. The method of claim 1, wherein the one or more objects in the first scene is identified using computer vision.
  - 9. The computer-implemented method of claim 1, wherein:
    - the human speech utterance further describes the image capture behavior of the computing device in response to future instances of identification of the first object; and
      
      the method further comprises determining, by the computing device, the requested image capture behavior based at least in part on the transcription.
  - 10. The computer-implemented method of claim 9, wherein:
    - the human speech utterance requests the computing device not capture imagery in response to future instances of identification of the first object in the future image data; and
      
      defining, by the computing device, the new rule comprises defining, by the computing device, the new rule that specifies that the computing device does not capture imagery in response to future instances of identification of the first object in the future image data.
  - 11. The computer-implemented method of claim 1, wherein:
    - the human speech utterance self-references a speaker of the human speech utterance; and
      
      identifying, by the computing device based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance comprises identifying, by the computing device based at least in part on the image data, the speaker of the human speech utterance.

12. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  obtaining, by the one or more computers, (i) image data that describes a first scene and (ii) audio data;
  
  identifying one or more objects in the first scene based on the image data;
  
  obtaining a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene;
  
  identifying, based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance;
  
  defining a new rule that specifies an image capture behavior of at least one of the one or more computers in response to future instances of identification of the first object in future image data that is different than the current image data; and
  
  controlling a future operation of the at least one of the one or more computers to comply with the new rule.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
- - 13. The system of claim 12, wherein controlling a future operation of the at least one of the one or more computers comprises determining whether to store the future image data.
  - 14. The system of claim 12, wherein controlling a future operation of the at least one of the one or more computers comprises determining whether to automatically upload future generated image data to cloud storage.
  - 15. The system of claim 12, wherein identifying one or more objects in the first scene comprises at least one of identifying a person using face detection, identifying a gesture performed by a person in the first scene, or detecting an action performed by a person in the first scene.
  - 16. The system of claim 12, further comprising generating, by the one or more computers, the image data and the audio data.
  - 17. The system of claim 12, wherein the one or more computers comprise a camera.
  - 18. The system of claim 12, wherein the transcription of the audio data is obtained using automated speech recognition.
  - 19. The system of claim 12, wherein the one or more objects in the first scene is identified using computer vision.

20. A non-transitory, computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
- obtaining (i) image data that describes a first scene and (ii) audio data;
  
  identifying one or more objects in the first scene based on the image data;
  
  obtaining a transcription of a human speech utterance described by the audio data, wherein the human speech utterance refers to at least a first object of the one or more objects included in the first scene;
  
  identifying, based at least in part on the transcription and based at least in part on the image data, at least the first object that is referred to by the human speech utterance;
  
  defining a new rule that specifies an image capture behavior of the one or more computers in response to future instances of identification of the first object in future image data that is different than the current image data; and
  
  controlling a future operation of the one or more computers to comply with the new rule.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Rifkin, Ryan M., Ramage, Daniel
Primary Examiner(s)
FLOHRE, JASON A

Application Number

US15/048,360
Publication Number

US 20170041523A1
Time in Patent Office

578 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/00   Input arrangements for tran...

G06F 3/017   Gesture based interaction, ...

G06V 40/28   Recognition of hand or arm ...

G06V 40/63   by static guides

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

H04N 23/611   where the recognised object...

H04N 23/66   Remote control of cameras o...

Speech and computer vision-based control

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech and computer vision-based control

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links