Annotating images based on multi-modal sensor data
First Claim
1. An aerial vehicle comprising:
- a plurality of propulsion motors, wherein each of the propulsion motors comprises a propeller and a drive shaft, and wherein each of the propulsion motors is configured to rotate the propeller about an axis defined by the drive shaft;
a digital camera configured to capture one or more visual images;
a thermal camera configured to capture one or more thermal images, wherein the digital camera and the thermal camera are calibrated and aligned with fields of view that overlap at least in part; and
a control system having at least one computer processor, wherein the control system is in communication with each of the digital camera, the thermal camera and the plurality of propulsion motors, and wherein the at least one computer processor is configured to execute one or more instructions for performing a method comprising;
initiating a first operation of at least one of the plurality of propulsion motors;
during the first operation,capturing a first plurality of visual images by the digital camera; and
capturing a second plurality of thermal images by the thermal camera;
receiving information regarding at least one visual attribute and at least one thermal attribute of an object;
detecting the at least one visual attribute of the object within a first portion of a first one of the first plurality of visual images;
detecting the at least one thermal attribute of the object within a second portion of a second one of the second plurality of thermal images;
determining that the first portion of the first one of the first plurality of visual images corresponds to the second portion of the second one of the second plurality of thermal images;
generating an annotation of the first one of the first plurality of visual images based at least in part on at least one of the first portion of the first one of the first plurality of visual images or the second portion of the second one of the second plurality of thermal images;
storing the annotation in association with at least the first one of the first plurality of visual images;
providing at least the first one of the first plurality of visual images to a classifier as a training input;
providing at least the annotation to the classifier as a training output;
training the classifier using at least the training input and the training output;
capturing at least a second plurality of visual images by the digital camera;
providing at least one of the second plurality of visual images to the classifier as an input;
receiving an output from the classifier; and
identifying a portion of the at least one of the second plurality of visual images depicting the object based at least in part on the output.
1 Assignment
0 Petitions
Accused Products
Abstract
Imaging data or other data captured using a camera may be classified based on data captured using another sensor that is calibrated with the camera and operates in a different modality. Where a digital camera configured to capture visual images is calibrated with another sensor such as a thermal camera, a radiographic camera or an ultraviolet camera, and such sensors capture data simultaneously from a scene, the respectively captured data may be processed to detect one or more objects therein. A probability that data depicts one or more objects of interest may be enhanced based on data captured from calibrated sensors operating in different modalities. Where an object of interest is detected to a sufficient degree of confidence, annotated data from which the object was detected may be used to train one or more classifiers to recognize the object, or similar objects, or for any other purpose.
28 Citations
20 Claims
-
1. An aerial vehicle comprising:
-
a plurality of propulsion motors, wherein each of the propulsion motors comprises a propeller and a drive shaft, and wherein each of the propulsion motors is configured to rotate the propeller about an axis defined by the drive shaft; a digital camera configured to capture one or more visual images; a thermal camera configured to capture one or more thermal images, wherein the digital camera and the thermal camera are calibrated and aligned with fields of view that overlap at least in part; and a control system having at least one computer processor, wherein the control system is in communication with each of the digital camera, the thermal camera and the plurality of propulsion motors, and wherein the at least one computer processor is configured to execute one or more instructions for performing a method comprising; initiating a first operation of at least one of the plurality of propulsion motors; during the first operation, capturing a first plurality of visual images by the digital camera; and capturing a second plurality of thermal images by the thermal camera; receiving information regarding at least one visual attribute and at least one thermal attribute of an object; detecting the at least one visual attribute of the object within a first portion of a first one of the first plurality of visual images; detecting the at least one thermal attribute of the object within a second portion of a second one of the second plurality of thermal images; determining that the first portion of the first one of the first plurality of visual images corresponds to the second portion of the second one of the second plurality of thermal images; generating an annotation of the first one of the first plurality of visual images based at least in part on at least one of the first portion of the first one of the first plurality of visual images or the second portion of the second one of the second plurality of thermal images; storing the annotation in association with at least the first one of the first plurality of visual images; providing at least the first one of the first plurality of visual images to a classifier as a training input; providing at least the annotation to the classifier as a training output; training the classifier using at least the training input and the training output; capturing at least a second plurality of visual images by the digital camera; providing at least one of the second plurality of visual images to the classifier as an input; receiving an output from the classifier; and identifying a portion of the at least one of the second plurality of visual images depicting the object based at least in part on the output. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
capturing first data from a scene by a first sensor operating in a first modality; capturing second data from the scene by at least a second sensor operating in a second modality, wherein the second sensor is calibrated with the first sensor, and wherein a first field of view of the first sensor overlaps with a second field of view of the second sensor at least in part, wherein one of the first data or the second data comprises visual imaging data, and wherein one of the first data or the second data does not include visual imaging data; detecting at least a first attribute of an object of a type in a first portion of a first representation of at least some of the first data, wherein the first representation is generated based at least in part on at least some of the first data captured at a first time; identifying at least a second portion of a second representation of at least some of the second data, wherein the second portion of the second representation corresponds to at least the first portion of the first representation; providing the at least some of the second data as a second input to a second object detection algorithm, wherein the second object detection algorithm is configured to detect an object of the type within data of the second modality; receiving a second output from the second object detection algorithm; and detecting at least a second attribute of an object of the type in the second portion of the second representation of the second data based at least in part on the second output, wherein the second attribute is one of an edge, a contour, an outline, a color, a texture, a silhouette or a shape of an object of the type; generating at least one annotation of an object of the type based at least in part on at least one of the first portion of the first representation or the second portion of the second representation; storing at least one annotation in association with at least some of the second data; capturing third data by a third sensor operating in at least one of the first modality or the second modality; providing at least some of the third data to a classifier as an input, wherein the classifier is trained to detect an object of the type within data of at least one of the first modality or the second modality based at least in part on at least one of the first portion of the first representation or the second portion of the second representation as a training input and the at least one annotation as a training output; receiving an output from the classifier; and detecting at least a portion of an object of the type within a third representation of the third data based at least in part on the output received from the classifier. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method comprising:
-
affixing a sensing system to at least a portion of an unmanned aerial vehicle, wherein the unmanned aerial vehicle comprises a first sensor, wherein the sensing system comprises a second sensor, wherein the second sensor is calibrated with the first sensor, and wherein a first field of view of the first sensor overlaps with a second field of view of the second sensor; causing the unmanned aerial vehicle to engage in at least one flight operation; capturing first data from a scene by the first sensor operating in a first modality, wherein the first data is captured with the unmanned aerial vehicle engaged in the at least one flight operation; capturing second data from the scene by at least the second sensor operating in a second modality, wherein the second data is captured with the unmanned aerial vehicle engaged in the at least one flight operation; detecting at least a first attribute of an object of a type in a first portion of a first representation of at least some of the first data, wherein the first representation is generated based at least in part on at least some of the first data captured at a first time; identifying at least a second portion of a second representation of at least some of the second data, wherein the second portion of the second representation corresponds to at least the first portion of the first representation; in response to identifying at least the second portion of the second representation, generating at least one annotation of an object of the type based at least in part on the first portion of the first representation; and storing at least one annotation in association with at least some of the second data, wherein one of the first data or the second data comprises visual imaging data, and wherein one of the first data or the second data does not include visual imaging data; and after capturing the second data from the scene, terminating the at least one flight operation; and removing the sensing system from at least the portion of the unmanned aerial vehicle. - View Dependent Claims (13, 14, 15)
-
-
16. A method comprising:
-
affixing a secondary sensing system to at least one surface of an aerial vehicle, wherein the aerial vehicle comprises a first sensor operating in a first modality, wherein the secondary sensing system comprises a second sensor operating in a second modality, and wherein a first field of view of the first sensor and a second field of view of the second sensor overlap at least in part with the secondary sensing system affixed to the at least one surface of the aerial vehicle; initiating at least a first flight operation of the aerial vehicle; capturing, by the first sensor, first data during the first flight operation; capturing, by the second sensor, second data during the first flight operation; detecting at least a first attribute of a first object within a first portion of a first representation of the first data, wherein the first attribute relates to the first modality; identifying a second portion of a second representation of the second data corresponding to the first portion of the first representation of the first data; determining that the second portion of the second representation depicts at least a second attribute of the first object, wherein the second attribute relates to the second modality; generating an annotation of at least one of the first data or the second data based at least in part on the first portion or the second portion; training at least one classifier to recognize one or more objects based at least in part on the at least one of the first data or the second data and the annotation; removing the secondary sensing system from the at least one surface of the aerial vehicle; initiating at least a second flight operation of the aerial vehicle; capturing, by the first sensor, third data during the second flight operation; providing at least some of the third data to the at least one trained classifier as an input; receiving at least one output from the at least one trained classifier; and detecting at least a third attribute of a second object within at least a third portion of a third representation of the third data based at least in part on the at least one output. - View Dependent Claims (17, 18, 19, 20)
-
Specification