SYSTEMS AND METHODS FOR FEWSHOT TRANSFER LEARNING

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
1
Assignment
First Claim
1. A method for training a controller to control a robotic system in a target domain, the method comprising:
 receiving a neural network of an original controller for controlling the robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space the neural network of the original controller comprising a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to;
map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and
assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters;
updating the encoder parameters to minimize a dissimilarity, in the feature space, between;
a plurality of origin feature vectors computed from the origin data samples; and
a plurality of target feature vectors computed from a plurality of target data samples from the target domain, the target data samples having a smaller cardinality than the origin data samples; and
updating the controller with the updated encoder parameters to control the robotic system in the target domain.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for training a controller to control a robotic system includes: receiving a neural network of an original controller for the robotic system based on origin data samples from an origin domain and labels in a label space, the neural network including encoder and classifier parameters, the neural network being trained to: map an input data sample from the origin domain to a feature vector in a feature space using the encoder parameters; and assign a label of the label space to the input data sample using the feature vector based on the classifier parameters; updating the encoder parameters to minimize a dissimilarity, in the feature space, between: origin feature vectors computed from the origin data samples; and target feature vectors computed from target data samples from a target domain; and updating the controller with the updated encoder parameters to control the robotic system in the target domain.
0 Citations
No References
No References
27 Claims
 1. A method for training a controller to control a robotic system in a target domain, the method comprising:
receiving a neural network of an original controller for controlling the robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space the neural network of the original controller comprising a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to; map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters; updating the encoder parameters to minimize a dissimilarity, in the feature space, between; a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from the target domain, the target data samples having a smaller cardinality than the origin data samples; and updating the controller with the updated encoder parameters to control the robotic system in the target domain.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
 12. A system for training a controller to control a robotic system in a target domain, the system comprising:
a processor; and nonvolatile memory storing instructions that, when executed by the processor, cause the processor to; receive a neural network of an original controller for controlling the robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space, the neural network of the original controller comprising a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to; map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters; update the encoder parameters to minimize a dissimilarity between; a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from the target domain, the target data samples having a smaller cardinality than the origin data samples; and update the controller with the updated encoder parameters to control the robotic system in the target domain.  View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
 20. A nontransitory computer readable medium having instructions stored thereon that, when executed by a processor, cause the processor to:
receive a neural network of an original controller for controlling a robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space, the neural network of the original controller comprising a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to; map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters; update the encoder parameters to minimize a dissimilarity between; a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from a target domain, the target data samples having a smaller cardinality than the origin data samples; and update the controller with the updated encoder parameters to control a robotic system in the target domain.  View Dependent Claims (21, 22, 23, 24, 25, 26, 27)
1 Specification
This application claims the benefit of U.S. Provisional Patent Application No. 62/752,166, “SYSTEM AND METHOD FOR FEWSHOT TRANSFER LEARNING,” filed in the United States Patent and Trademark Office on Oct. 29, 2018, the entire disclosure of which is incorporated by reference herein.
Aspects of embodiments of the present invention relate to the field of machine learning.
Developments in machine learning, such as deep learning, have led to algorithms with high performance in a wide range of applications. However, these techniques typically depend on the availability of huge labeled datasets to train the algorithms. In some scenarios, large datasets are not available for training, such as when data labeling and annotation is expensive, or when, due to drifts in the data distribution, the training and deployment datasets have different distributions (e.g., the labeled data that is available for training is very different from the data seen in the real world).
Some approaches to addressing the problem of labeled data scarcity include transfer learning and domain adaptation (the terms are sometimes used interchangeably), which are closely related paradigms used to improve learning speed and model generalization. These approaches overcome labeled data scarcity in a target domain of interest by transferring knowledge effectively from a related source domain where labeled data is available.
Aspects of embodiments of the present invention relate to systems and methods for transfer learning between two domains. Knowledge transfer may be used to overcome labeled data scarcity in one domain by adapting a model trained on a different, but related, domain. Some aspects of embodiments of the present invention relate to learning a domainagnostic intermediate embedding of the data samples (e.g., mapping the data samples into a feature space), such as learning an embedding using unsupervised domain adaptation (UDA) by minimizing a discrepancy between the distributions of the source and target domains in the embedding space. In more detail, in some embodiments of the present invention, the discrepancy is calculated using a sliced Wasserstein distance (SWD) between the distributions in the embedding space (or in feature space). Some aspects of embodiments of the present invention relate to computing pseudolabels for the selected unlabeled samples in the target domain in order to align the corresponding classes in the embedding space.
According to one embodiment of the present invention, a method for training a controller to control a robotic system in a target domain includes: receiving a neural network of an original controller for controlling the robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space, the neural network of the original controller including a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to: map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters; updating the encoder parameters to minimize a dissimilarity, in the feature space, between: a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from the target domain, the target data samples having a smaller cardinality than the origin data samples; and updating the controller with the updated encoder parameters to control the robotic system in the target domain.
The dissimilarity may be computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space.
The updating the encoder parameters may include iteratively computing a plurality of intermediate encoder parameters, each iteration including: computing the origin feature vectors in the feature space; computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the dissimilarity between the origin feature vectors and the target feature vectors; updating the intermediate encoder parameters to reduce the dissimilarity between the origin feature vectors and the target feature vectors; determining whether the dissimilarity is minimized; in response to determining that the dissimilarity is not minimized, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters; and in response to determining that the dissimilarity is minimized, outputting the intermediate encoder parameters as the updated encoder parameters.
The dissimilarity may be computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space.
The computing the origin feature vectors may be performed by an origin encoder.
The computing the origin feature vectors may be performed in accordance with the intermediate encoder parameters.
The target data samples may include a plurality of target samples and a plurality of corresponding target labels.
The target data samples may include a plurality of unlabeled target samples.
The updating the encoder parameters may include iteratively computing a plurality of intermediate encoder parameters, each iteration including: computing the origin feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing predicted labels for the target feature vectors in accordance with the classifier parameters, each of the predicted labels being associated with a confidence; defining a plurality of pseudolabels corresponding to the predicted labels having confidences exceeding a threshold; updating the intermediate encoder parameters based on at least one of: minimizing a dissimilarity between the origin feature vectors and the target feature vectors; and minimizing a classification loss of the origin data samples; determining whether a stopping condition has been met, wherein the stopping condition may include at least one of: a dissimilarity between the origin feature vectors and the target feature vectors; and a saturation of a number of the pseudolabels between iterations; in response to determining that the stopping condition has not been met, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters; and in response to determining that the stopping condition is met, outputting the intermediate encoder parameters as the updated encoder parameters.
The updating the intermediate encoder parameters may alternate between: the minimizing the dissimilarity between the origin feature vectors and the target feature vectors; and the minimizing the classification loss of the origin data samples.
The neural network may include a convolutional neural network, a recurrent neural network, a capsule network, or combinations thereof.
According to one embodiment of the present invention, a system for training a controller to control a robotic system in a target domain includes: a processor; and memory storing instructions that, when executed by the processor, cause the processor to: receive a neural network of an original controller for controlling the robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space, the neural network of the original controller may include a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to: map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters; update the encoder parameters to minimize a dissimilarity between: a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from the target domain, the target data samples having a smaller cardinality than the origin data samples; and update the controller with the updated encoder parameters to control the robotic system in the target domain.
The dissimilarity may be computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space.
The instructions that cause the processor to update the encoder parameters may include instructions that, when executed by the processor cause the processor to iteratively compute a plurality of intermediate encoder parameters, each iteration including: computing the origin feature vectors in the feature space; computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the dissimilarity between the origin feature vectors and the target feature vectors; updating the intermediate encoder parameters to reduce the dissimilarity between the origin feature vectors and the target feature vectors; determining whether the dissimilarity is minimized; in response to determining that the dissimilarity is not minimized, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters; and in response to determining that the dissimilarity is minimized, outputting the intermediate encoder parameters as the updated encoder parameters.
The dissimilarity may be computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space.
The origin feature vectors may be computed in accordance with the encoder parameters.
The origin feature vectors may be computed in accordance with the intermediate encoder parameters.
The target data samples may include a plurality of target samples and a plurality of corresponding target labels.
The target data samples may include a plurality of unlabeled target samples.
The instructions that cause the processor to update the encoder parameters may include instructions that, when executed by the processor, cause the processor to compute the updated encoder parameters by iteratively computing a plurality of intermediate encoder parameters, each iteration including: computing the origin feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing predicted labels for the target feature vectors in accordance with the classifier parameters, each of the predicted labels being associated with a confidence; defining a plurality of pseudolabels corresponding to the predicted labels having confidences exceeding a threshold; updating the intermediate encoder parameters based on at least one of: minimizing a dissimilarity between the origin feature vectors and the target feature vectors; and minimizing a classification loss of the origin data samples; determining whether a stopping condition has been met, wherein the stopping condition may include at least one of: a dissimilarity between the origin feature vectors and the target feature vectors; and a saturation of a number of the pseudolabels between iterations; in response to determining that the stopping condition has not been met, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters; and in response to determining that the stopping condition is met, outputting the intermediate encoder parameters as the updated encoder parameters.
The updating the intermediate encoder parameters may alternate between: the minimizing the dissimilarity between the origin feature vectors and the target feature vectors; and the minimizing a classification loss of the origin data samples.
The neural network may include a convolutional neural network, a recurrent neural network, a capsule network, or combinations thereof.
According to one embodiment of the present invention, a nontransitory computer readable medium has instructions stored thereon that, when executed by a processor, cause the processor to: receive a neural network of an original controller for controlling a robotic system based on a plurality of origin data samples from an origin domain and corresponding labels in a label space, the neural network of the original controller comprising a plurality of encoder parameters and a plurality of classifier parameters, the neural network being trained to: map an input data sample from the origin domain to a feature vector in a feature space in accordance with the encoder parameters; and assign a label of the label space to the input data sample based on the feature vector in accordance with the classifier parameters; update the encoder parameters to minimize a dissimilarity between: a plurality of origin feature vectors computed from the origin data samples; and a plurality of target feature vectors computed from a plurality of target data samples from a target domain, the target data samples having a smaller cardinality than the origin data samples; and update the controller with the updated encoder parameters to control a robotic system in the target domain.
The dissimilarity may be computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space.
The instructions that cause the processor to update the encoder parameters may include instructions that, when executed by the processor cause the processor to iteratively compute a plurality of intermediate encoder parameters, each iteration including: computing the origin feature vectors in the feature space; computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the dissimilarity between the origin feature vectors and the target feature vectors; updating the intermediate encoder parameters to reduce the dissimilarity between the origin feature vectors and the target feature vectors; determining whether the dissimilarity is minimized; in response to determining that the dissimilarity is not minimized, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters; and in response to determining that the dissimilarity is minimized, outputting the intermediate encoder parameters as the updated encoder parameters.
The dissimilarity may be computed in accordance with a sliced Wasserstein distance between the origin feature vectors in the feature space and the target feature vectors in the feature space.
The origin feature vectors may be computed in accordance with the encoder parameters.
The origin feature vectors may be computed in accordance with the intermediate encoder parameters.
The target data samples may include a plurality of target samples and a plurality of corresponding target labels.
The target data samples may include a plurality of unlabeled target samples.
The instructions that cause the processor to update the encoder parameters may include instructions that, when executed by the processor, cause the processor to compute the updated encoder parameters by iteratively computing a plurality of intermediate encoder parameters, each iteration including: computing the origin feature vectors in the feature space in accordance with the intermediate encoder parameters; computing the target feature vectors in the feature space in accordance with the intermediate encoder parameters; computing predicted labels for the target feature vectors using the classifier parameters, each of the predicted labels being associated with a confidence; defining a plurality of pseudolabels corresponding to the predicted labels having confidences exceeding a threshold; updating the intermediate encoder parameters based on at least one of: minimizing a dissimilarity between the origin feature vectors and the target feature vectors; and minimizing a classification loss of the origin data samples; determining whether a stopping condition has been met, wherein the stopping condition m include at least one of: a dissimilarity between the origin feature vectors and the target feature vectors; and a saturation of a number of the pseudolabels between iterations; in response to determining that the stopping condition has not been met, proceeding with another iteration with the updated intermediate encoder parameters as the intermediate encoder parameters; and in response to determining that the stopping condition is met, outputting the intermediate encoder parameters as the updated encoder parameters.
The updating the intermediate encoder parameters may alternate between: the minimizing the dissimilarity between the origin feature vectors and the target feature vectors; and the minimizing the classification loss of the origin data samples.
The neural network may include a convolutional neural network, a recurrent neural network, a capsule network, or combinations thereof.
The accompanying drawings, together with the specification, illustrate exemplary embodiments of the present invention, and, together with the description, serve to explain the principles of the present invention.
In the following detailed description, only certain exemplary embodiments of the present invention are shown and described, by way of illustration. As those skilled in the art would recognize, the invention may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Like reference numerals designate like elements throughout the specification.
Aspects of embodiments of the present invention relate to systems and methods for leveraging previously learned models (e.g., models trained based on prior knowledge from one domain, which may be referred to herein as an “origin” domain or a “source” domain _{S}) to learn new tasks (e.g., adapting the models based on new data from a new or different domain, which may be referred to herein as a “target” domain _{T}). Some aspects of embodiments of the present invention relate to systems and methods for learning the new tasks based on a small number (e.g., on the order of tens) of samples from the target domain. One aspect of embodiments of the present invention relates to a method for transfer learning that leverages an origin or a source dataset with many labeled samples (e.g., a synthetic dataset where labels are readily available at essentially no additional cost) that was used to learn a model to perform a task (such as object classification, robotic manipulation, or autonomous navigation) and modifies the model to perform the task on a new target dataset with only few labeled samples (e.g., a realworld dataset with a handful of labels from costly ground truth data such as manually labeled data). One aspect of embodiments of the present invention relates to generating pseudolabels in circumstances where the samples from the new or different domain are unlabeled.
According to some aspects of embodiments of the present invention, the system includes two modules, namely: 1) Machine Learning Module A 10A, which is a fully trained machine learning module (using many labeled samples from the origin or source domain), and 2) Machine Learning Module B 10B, which is required to learn a task that is different from, but related to, the task of Module A 10A, but with only few labeled samples or a few unlabeled samples from the target domain. As one example, to be described in more detail below, Machine Learning Module A 10A may be trained to recognize digits in images of handwritten numbers (the origin or source domain), and Machine Learning Module B 10B may be required to recognize digits in images of printed street numbers (the target domain) through an update or retraining of Module A 10A through a few examples from the target domain (e.g., a few images of street numbers). Note that, while the inputs differ, the outputs of these two classifications are the same; that is, both Machine Learning Module A 10A and Machine Learning Module B 10B output classifications of the input images as representing one of the digits from 0 to 9.
Aspects of embodiments of the present invention may be applied in a variety of contexts, such as where learning from a few samples is beneficial for efficient machine learning of an autonomous system that can be widely used under various environmental conditions or different sensor modalities. Examples of potential applications include, but are not limited to, autonomous driving (e.g., training a controller for a selfdriving vehicle to operate in one locality, and applying transfer learning to update the controller to operate a selfdriving vehicle in a different locality having different weather, different traffic patterns, and/or different traffic laws); Intelligence, Surveillance and Reconnaissance (ISR); and robotic manipulation.
As one concrete example, some embodiments of the present invention may be applied to a robotic manipulation system that is configured to reach and grab different objects.
The robotic system is required to first detect and localize an object, and then reach for it. Such robotic system is trained before deployment to grab simple objects (e.g., regular, rectangular objects). As shown in
On the other hand, in the deployment environment (or a target domain), the robotic arm system may be required to detect objects with a more complex appearance (e.g., soft bags, children'"'"'s toys, shoes, and the like). As shown in
Accordingly, some aspects of embodiments of the present invention relate to systems and methods for reconfiguring a previously trained model (e.g., of the robotic arm system) to learn a modified or new task (grabbing objects that were never seen during the initial training process).
As shown in
As such, some aspects of embodiments of the present invention relate to using a relatively small collection of deployment data (e.g., on the order of tens of samples) to update the previously trained ML Module A 10A to generate an ML Module B 10B capable of accurately performing tasks (e.g., classifying observed conditions to compute a behavior) in both the first domain and the second domain.
As shown in
Accordingly, some aspects of embodiments of the present invention relate to systems and methods for learning a shared encoder ψ that is applicable to both the original domain (predeployment or “origin” or “source” domain _{S}) and the deployment domain (or “target” domain _{T}). In some embodiments of the present invention, different encoders ϕ and ψ are trained for the origin or source domain _{S }and the target domain _{T}. As discussed in more detail below, according to some embodiments, this is achieved by minimizing a distance between the target and the source (or “origin”) distributions in the latent (or feature) space (or embedding space), while concurrently training a classifier network ρ 260 using the source (or origin) domain data X_{S}—in other words, minimizing the distance between the origin feature vectors ϕ(X_{S}) and the target feature vectors ψ(X_{T}). In some embodiments, this distance is a slicedWasserstein distance (SWD) (see, e.g., Kolouri, Soheil, Yang Zou, and Gustavo K. Rohde. “Sliced Wasserstein kernels for probability distributions.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.), as discussed in more detail below.
As shown in
As noted above, an encoder module ϕ 140 provides a parametric mapping from samples X to a latent space (or feature space) , ϕ: →. In some embodiments of the present invention, the encoder module is implemented using a neural network. In various embodiments, the neural network is a convolutional neural network, a recurrent neural network, a hybrid of a convolutional neural network and a recurrent neural network, a capsule network, etc. Also, as noted above, a linear classifier ρ 160 maps values (or feature vectors) from the latent space (or feature space) to the labels Y in label space , ρ: →. The composition of 0 and p defines a function that maps samples, X, to the labels, Y, ρ(ϕ(⋅)): →. In some embodiments, the functions ϕ and ρ are trained (e.g., endtoend trained) using backpropagation (see, e.g., Hagan, M. T. and Menhaj, M. B., 1994. “Training feedforward networks with the Marquardt algorithm.” IEEE transactions on Neural Networks, 5(6), pp. 989993 and LeCun, Yann, et al. “Backpropagation applied to handwritten zip code recognition.” Neural computation 1.4 (1989): 541551.). For example, the training process computes a plurality of encoder parameters configuring behavior of the encoder ϕ, and a plurality of classifier parameters configuring the behavior of the classifier ρ. However, embodiments of the resent invention are not limited thereto and other techniques such as evolutionary algorithms may be used instead. The encoder module ϕ 140 can be viewed as capturing the nonlinearities in the sample space by extracting useful features from the dataset X, such that the mapping between the latent (or feature) space and the label space can be modeled as being linear, thereby enabling use of a linear classifier ρ: → 160. These trained modules are shown, for example, in
During deployment, the trained model is expected to map the newly observed data X_{T}=[x_{1}^{T}, . . . , x_{M}^{T}]∈R^{t×M }to class labels Y′=∈R^{K×m}. However, the distribution of the newly observed data X_{T }sampled from a new domain _{T }(a second domain or target domain or Domain B) may be somewhat different from the domain _{S }of the training data X_{S}, Y_{S}, and, therefore, the previously learned mapping ϕ:→ may not provide sensible feature extraction from the target domain _{T }(e.g., applying ϕ to values X_{T }from _{T }might not lead to sensible inputs to ρ to compute labels Y_{T }for X_{T}). In addition, in some embodiments, the model training system may not have access to a large pool of labeled data from the new domain (e.g., the number of samples or cardinality M of the target training data is much smaller than the number of samples or cardinality N of the source or “origin” training data: M<<N). Accordingly, aspects of embodiments of the present invention relate to automatically adapting or updating the trained models (e.g., updating the encoder parameters), in operation 320, to the newly observed data from the new domain by considering a few labeled samples X_{T}, Y_{T }(e.g., tens of samples).
According to one embodiment of the present invention, the second encoder ψ 240 of the module B 10B, described above with respect to
Therefore, some aspects of embodiments of the present invention relate to automatically learning, in operation 320, the encoding parameters of an encoding function ϕ (e.g., learning the weights of a neural network) to map samples X_{S }from the original, predeployment domain _{S }(e.g., source domain or origin domain or Domain A) and encoder ψ that maps samples X_{T }from the new, postdeployment domain _{T }(e.g., target domain or Domain B) to the same latent space (or feature space) . In various embodiments, ϕ and ψ are the same or different encoders. If the distance between the distributions of the training data ϕ(X_{S}) (or origin feature vectors) and the distribution of the observed data (or target feature vectors) ψ(X_{T}) in the latent space (or feature space) is small, then the same classifier ρ can be used to classify samples from both domains (samples from _{S }and _{T}). In particular, the parameters of the encoder module B ψ may be calculated in accordance with Equation 1:
In other words, by minimizing the loss function provided as input to argmin_{ψ}, where D is a dissimilarity measure between distributions, the first term D (p(ϕ(X_{S})),p(ψ(X_{T}))) enforces the probability distribution (p(⋅)) of all projected data points p(ψ(X_{T})) to match that of the training samples p(ϕ(X_{S})) where no class information is used, the second term Σ_{k }D(p(ϕ(X_{S})C_{k}),p(ψ(X_{T})C_{k})) enforces the classspecific distribution C_{k }of the few labeled samples, p(ψ(X_{T})C_{k}), to match the distribution of the corresponding class in the training set, p(ϕ(X_{S})C_{k}), and λ is a regularization parameter. Note that the first term carries no class information C and hence is an unsupervised loss function, while the second term does include class information and therefore is a supervised loss function. As such, in circumstances where ϕ and ψ share parameters, the encoder ψ is learned (e.g., the encoder parameters are calculated or learned) using data points from both domains (samples X_{S }and X_{T }from the source (or “origin”) and target domains, respectively) and the classifier is concurrently learned (e.g., the classifier parameters are calculated or learned) using labeled data Y_{S }from the source (predeployment or origin) domain _{S }and labeled data Y_{T }from the target domain _{T}.
In some embodiments of the present invention, the dissimilarity measure D is a slicedWasserstein distance. In related art, KullbackLeibler (KL) divergence and related distance measures such as the JensenShannon divergence have been used as measures of dissimilarity. However, such measures generally perform poorly when the distributions are supported on nonoverlapping, lowdimensional manifolds.
Accordingly, some aspects of embodiments of the present invention relate to the use of a slicedWasserstein distance as a metric, which provides a more robust alternative to the metrics used in the related art. The idea behind the slicedWasserstein distance is to slice the highdimensional distributions into their onedimensional marginal distributions and measure the cumulative distance between their corresponding marginal distributions.
{ϕ(x_{i}^{s})∈^{ƒ}˜p_{S}}_{i=1}^{M }
representing the source (or origin) distribution p_{S }and
{ψ(x_{i}^{t})∈^{ƒ}˜p_{T}}_{i=1}^{M }
representing the target distribution p_{T }is approximated as:
for θ_{l}∈^{ƒ1 }as random samples from the unit ƒdimensional ball, and where s_{1}[i] and t_{l}[i] are the sorted indices of {θ_{l}·ϕ(x_{i})}_{i=1}^{M }for source (or “origin”) and target domains, respectively.
Accordingly, Equation 1 above may be rewritten to replace the generic dissimilarity measure D with the slicedWasserstein distance (SW_{2}^{2}) to yield Equation 2:
where the slicedWasserstein between two mdimensional distributions p and q is defined in Equation 3 as:
SW_{2}^{2}(p,q)=∫_{Sm1}∫RRP(t,θ)−RQ(t,θ)^{2}dtdθ
where S^{m1 }is the unit sphere in the mdimensional latent space, RP (t, θ) is the cumulative distribution of the marginal distribution Rp(⋅, θ) defined in Equation 4 as:
RP(t,θ)=∫_{−∞}^{t}Rp(τ,θ)dτ,∀θ∈S^{m1 }
and RQ(⋅, θ) is defined similarly to Rp(⋅, θ), and the marginal distribution Rp(⋅, θ) (and, similarly, Rq(⋅, θ)) is defined in Equation 5 as:
Rp(t,θ)=∫_{X}p(x)δ(t−x·θ)dx,∀θ∈S^{m1},∀t∈R
In some embodiments, when the actual distributions of p and q are not available, the discrete approximations of Equations 3, 4, and 5 are used based on observed samples from the distributions. For example, when only samples from distributions are available, the pWasserstein distance can be approximated as the _{p }distance between the sorted samples (see, e.g., Hagan, M. T. and Menha), M. B., 1994. “Training feedforward networks with the Marquardt algorithm.” IEEE transactions on Neural Networks, 5(6), pp. 989993 and Kolouri, S.; Martin, C. E.; and Rohde, G. K. 2018. “SlicedWasserstein Autoencoder: An embarrassingly simple generative model.” arXiv preprint arXiv:1804.01947.).
As one example of an application of embodiments of the present invention, a model initially trained to recognize digits based on images of handwritten digits is updated to recognize digits of images of printed digits (house numbers) based on a small sample of printed digits.
In more detail, in one embodiment, the Modified National Institute of Standards and Technology (MNIST) database (see, e.g., Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. “Gradientbased learning applied to document recognition.” Proceedings of the IEEE, 86(11):22782324, November 1998.) of handwritten digits labeled with the digit represented in the image (ground truth labels) C, represented as (x_{n}^{S},y_{n}^{S}) is used to train a model (e.g., a deep neural network) mapping from the samples X_{S }to labels Y_{S}, where, as discussed above, the model may be viewed as the composition of an encoder ϕ and a linear classifier ρ (ρ∘ϕ:→). The encoder ϕ represents a first portion of the model mapping inputs X to values Z in latent space (or feature space) (ϕ:→) and the linear classifier ρ represents a mapping of the values (or feature vectors) Z from latent space (or feature space) to labels Y in label space .
The different shapes in the plots in latent (or feature) space reflect the different classes C_{k }(in this case, the ten classes representing the digits 0 through 9). As seen in
To recognize the printed numbers of the deployment or target domain, the encoder ψ is updated or retrained to match the labeled and unlabeled distributions of the Target domain to that of the Source (or Origin) domain, based on labeled (x_{n}^{T}, y_{n}^{T}) and unlabeled samples X′_{T }from the Street View House Numbers (SVHN) dataset (see, e.g., Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A. Y. Ng. “Reading Digits in Natural Images with Unsupervised Feature Learning.” NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.).
As another example,
Electrooptical (EO) images are commonly used visual data in computer vision and machine learning. Many autonomous systems rely on algorithms that process and learn from EO data captured by, for example, digital cameras configured to detect light in the visible, infrared, and/or ultraviolet spectra. Deep Convolutional Neural Networks (CNNs) have been applied to classification and detection algorithms with humanlevel performance. However, some applications (such as continuous environmental monitoring and Earthresource mapping) require imaging under conditions where EO imaging is not feasible, such as during night or in inclement weather.
In contrast, synthetic aperture radar (SAR) imaging provides such a capability by providing high resolution images using the radar signals which can propagate in occluded weather and which do not depend on the presence of other sources of electromagnetic radiation (e.g., the sun). However, training CNNs in the SAR domain can be challenging. Training CNNs in the EO domain leverages the availability of huge labeled datasets, which may be available through crowdsourcing labeling platforms such as Amazon Mechanical Turk and publicly available datasets such as ImageNet. However, in the SAR domain, these labeled datasets may be more difficult to obtain because, for example: preparing devices for collecting SAR datasets is much more expensive compared to EO datasets; SAR images are often classified, making public access to SAR data heavily regulated and limited; labeling SAR images needs trained experts, as opposed to the ability of lay people to perform labeling of natural EO domain images; and continuous collection of SAR data makes the labeled data unrepresentative of the current data distribution.
Accordingly, some embodiments of the present invention relate to transferring knowledge from a model trained using EO imagery to generate a model capable of making classifications based on SAR data. In particular, embodiments of the present invention relate to training an encoder ψ (e.g., learn encoder parameters) so that input samples from the domain of aerial SAR images are mapped into feature space with substantially the same distribution of input samples from the domain of aerial EO images mapped into feature space by encoder ϕ. By doing so, the same, previously trained classifier ρ may be repurposed for use with SAR images.
For the target domain, aerial SAR images of the South African Exclusive Economic Zone were preprocessed into 21 by 21 pixel subimages. (See, e.g., Schwegmann, C.; Kleynhans, W.; Salmon, B.; Mdakane, L.; and Meyer, R. 2016. “Very deep learning for ship discrimination in synthetic aperture radar imagery.” In IEEE International Geo. and Remote Sensing Symposium, 104107.). Accordingly, the binary ship detection problem was whether each instanced contained a “ship” (positive data points) or contained “noship” (negative data points). Experts analyzed the subimages to manually label 1,596 positive data points (subimages containing ships) and 3,192 negative data points (subimages not containing ships).
In this example, to solve the problem of automatically classifying the SAR data using a trained model, an initial model was trained using an initial (source or origin) dataset including 4,000 color (e.g., RGB) images of ships extracted from satellite imagery of the San Francisco Bay area, captured by a constellation of satellites operated by Planet Labs Inc. Each of the images of the dataset was already labeled as “ship” or “noship.” The initial model included an encoder ϕ and the classifier ρ, which classified the aerial electrooptical images as C_{1}: part of a ship or C_{2}: part of the background (e.g., water). In more detail, in one embodiment, a deep convolutional neural network (CNN) was trained, where the encoder portion corresponded to four layers of filters and the classifier portion p corresponded to two layers. The deep CNN was trained using a loss function in accordance with Equation 1 above:
Accordingly, embodiments of the present invention allow for transfer learning, enabling models (e.g., deep neural networks) trained in one domain to be applied to perform tasks in a different, but related, target domain using only a few labeled examples from the target domain (fewshot learning).
In some circumstances, labels are not available for the samples in the target domain. Therefore, some aspects of embodiments of the present invention relate to an unsupervised (e.g., automatic) technique for updating the model trained based on the source (or origin) domain to perform classification tasks on samples from a target domain.
More precisely, in this example, the source (or origin) domain _{S }includes pairs (X_{S}, Y_{S}) with N labeled data points, where X_{S}=[x_{1}^{s}, . . . , x_{N}^{s}]∈⊂^{K×N }denotes the samples and Y_{S}=[y_{1}^{s}, . . . , y_{N}^{s}]∈∪^{K×N }contains the corresponding labels. Note that the label y_{n}^{s }identifies the membership of corresponding sample x_{n}^{s }to one or more of the K classes (e.g., the digits 0 through 9 in the classification task of digit recognition). It is also assumed that the samples X_{S }are independent and identically distributed (i.i.d.) from the source (or origin) joint probability distribution ((x_{i}^{s},y_{i})˜p(x^{s},y)). The source (or origin) marginal distribution over x^{s }is denoted by p_{S}. Related target domain _{T }has M unlabeled data points X_{T}=, X_{T}=[x_{1}^{t}, . . . , x_{M}^{t}](in some embodiments, it is assumed that M<<N). The same type of labels apply to the target domain, and it is assumed that the samples from the target domain are drawn from the target marginal distribution x_{i}^{t}·p_{T}. It is also assumed that distributional discrepancy exists between the two domains: p_{s}≠p_{T}.
As discussed above, it is assumed that, given a large enough number N of source (or origin) samples X_{s }and their corresponding labels Y_{s}, a parametric function can be computed (or “learned”) to map from the samples to the labels (ƒ_{θ}X→, where θ denotes the parameters of the function). For example, in the case where the function ƒ_{θ }is implemented as a deep neural network, the parameters θ may correspond to the learned weights of the connections between the layers of the neural network. In this case, the parameters θ can be learned by minimizing the empirical risk, θ=argmin_{θ}Σ_{i }(ƒ_{θ}(x_{i}^{s}), y_{i}), with respect to an appropriate loss function , such as cross entropy loss (in other words, choosing parameters to minimize the difference between the ground truth labels Y and the output of the classification function ƒ_{θ}).
Furthermore, as noted above, this function can be considered as the composition of an encoder function ψ_{v }and a classifier function ρ_{w}, where v and w correspond to the learned parameters of ψ and ρ. The encoder function ψ_{1}, may correspond to the initial stages of the neural network, while the classifier function ρ_{w }may correspond to the later stages of the neural network. In one embodiment, the same encoder function ψ_{v }takes inputs from both the source domain (or origin domain) _{S }and the target domain _{T }and maps those inputs to feature vectors in a shared embedding space (or feature space) and is therefore a “shared” encoder (ψ_{x}:→Z). As before, the classifier ρ maps from the embedding space to the label space (ρ:→).
Merely minimizing the term D(p(ψ(X_{S})), p(ψ(X(X′_{T})) would not be sufficient to learn an appropriate encoding function ψ because it does not guarantee semantic consistency between the source domain (or origin domain) _{S }and the target domain _{T}. Taking the specific example shown in
In the previous examples, labels Y_{T }were available for the few examples from the target domain _{T}, which allowed calculation of the term p(ψ(X_{T}^{□})C_{k}) in the loss function. However, in some circumstances, the data samples from the target domain are unlabeled (no corresponding labels Y_{T }are available for the target domain samples X_{T}) and therefore this term cannot be calculated directly.
Accordingly, some aspects of embodiments of the present invention relate to an unsupervised domain adaptation (UDA) algorithm which computes a surrogate of the objective by using confident pseudo labels of the target data that are obtained using the source classifier (or origin classifier) ρ. Generally, in some embodiments, the trained model is iteratively updated based on the unlabeled target domain data by computing pseudolabels Y′_{T }for a portion of the unlabeled target domain data X′_{T}. To calculate the pseudolabels Y′_{T}, the linear classifier ρ is applied to the embeddings of the target data samples X′_{T }in the latent space (the target feature vectors ψ(X′_{T})) to compute predicted the assigned class labels C for the unlabeled data. These class labels may be associated with confidence levels, as such, the classes having with high confidence (or high probability) are assigned to the pseudolabels Y′_{T}. This pseudolabeled portion of the unlabeled target domain data is then used to minimize the distance between the conditional distributions (of the feature vectors) in the latent space (or feature space) . As a result, as more learning iterations are performed, the number of target data points X′_{T }with correct (or high confidence) pseudolabels Y′_{T }grows and progressively enforce the distributions to align conditionally.
In some embodiments using DRCN for the initial step, DRCN is used to both classify the source (or origin) domain data _{S}=X_{S}, Y_{S }and also to reconstruct the labels Y′_{T }for the unlabeled target domain data _{T}=X_{T}. For both criteria to be met, the model training system automatically computes a shared encoder ψ to map both the source (or origin) and target data to the same latent space or feature embedding space or feature space . To accomplish this, the DRCN uses a source label prediction pipeline and a target reconstruction pipeline. For both pipelines, a feature extractor or encoder ψ is shared. To optimize the DRCN network, the pipelines are trained in an alternating, epochbyepoch fashion. In one example embodiment, the feature extractor has a structure as follows: 100 3×3 filters, 2×2 maxpooling layer, 150 3×3 filters, 2×2 maxpooling layer, 200 3×3 filters, and two 1,024 neuron, fullyconnected layers. Dropout layers, with a rate of 50%, were used after the fullyconnected layers. The classifier is a softmax layer, and a decoder, with an inverse structure of the feature extractor, completes an autoencoder. The control penalty used, λ, was set to λ=0.5 to give equal weighting to the classification and reconstruction loss. An Adam optimizer was used for all DRCN training, with optimal learning rates found to be in the range of [0.5×10^{4}, 3×10^{4}].
Referring to
In operation 1026, the model training system computes updated intermediate encoder parameters (e.g., weights of the connections in the neural network) for the encoder ψ using the assigned pseudolabels. In more detail, the assigned pseudolabels enable the model training system to compute the SWD conditioned on those pseudolabels (e.g., to compute D(p(ψ(X_{S})C_{k}), p(ψ(X′_{T})C_{k})) for at least some members of X′_{T}), and therefore the updated intermediate encoder parameters can be computed in order to reduce or minimize the dissimilarity between the source and target embeddings (or origin and target feature vectors) (ψ(X_{S}) and ψ(X′_{T})) in space (or feature space). In some embodiments, the training procedure alternates optimization between a classification loss for the source (or origin) data X_{S }(e.g., minimizing the number of misclassified instances of the source (or origin) data X_{S}, where ρ(ϕ(x_{n}^{s}))≠y_{n}^{s}), and a pseudosupervised SWD loss between the embedded source and target data distributions (or the distributions of the origin and target feature vectors) ϕ(X_{S}) and ψ(X′_{T}). Alternating optimization allows the discrepancy between the source (or origin) and target distributions to be reduced in a meaningful way during the SWD training steps. In some circumstances, simultaneous optimization of both losses results in slow to no reduction in the SWD.
In operation 1028, the current intermediate encoder parameters are evaluated to determine whether stopping conditions have been met. If the stopping conditions have not been met, then the process iterates by returning to operation 1022 and calculating new predicted labels and confidences based on the updated intermediate encoder parameters of the encoder ψ. If the stopping conditions have been met (described in more detail below), then the process terminates, outputting the updated intermediate encoder parameters as the updated encoder parameters of the updated encoder ψ.
After one iteration of computing classifications, adding high confidence samples, and updating the parameters of encoder ψ based on minimizing SWD loss to compute a new encoder ψ_{1}, the source (or origin) samples X_{S }(their feature vectors) remain wellclustered as ψ_{1}(X_{S}) in feature space, and some of the target samples X′_{T }(their feature vectors) have shifted positions in feature space as ψ_{1}(X′_{T}), where some of the samples are assigned pseudolabels (different shapes) in accordance with the confidence.
As seen in
In some embodiments of the present invention, the stopping conditions are derived from two metrics: the SWD loss, and the number of pseudolabeled target data. As seen in
The number of pseudolabels saturates because all easily separable target data points have moved in the shared embedding space to match the corresponding source domain (or origin domain) embeddings. If trained longer, more pseudolabels may be assigned. However, these final pseudolabeled points generally are not as accurate and can reduce or increase performance.
Effective training also depends on the balance of the number of optimization steps for each objective in a training iteration. For example, in one training iteration, one hundred sequential SWD optimization steps (which is easily met for the MNIST dataset with a batch size of five hundred) will cause catastrophic knowledge loss for the source (or origin) classifier. Conversely, only a few SWD optimization steps per training iteration will not improve the SWD loss. In various experimental runs, ten to fifteen SWD optimization steps and twenty to thirty classifier optimization steps per training iteration resulted in effective training. Effective training can be verified by monitoring the SWD loss at each training step to ensure that it is decreasing. Assuming appropriate learning rates, an increase in SWD loss at the start of training implies that there are too many SWD optimization steps per training iteration. On the other hand, when there are not enough SWD optimization steps in a row, then the loss will remain approximately constant.
As a concrete example, some embodiments of the present invention were implemented using the aforementioned MNIST and SVHN datasets along with a dataset collected from a United States Postal Service (USPS) post office (see Hull, Jonathan J. “A database for handwritten text recognition research.” IEEE Transactions on pattern analysis and machine intelligence 16.5 (1994): 550554.).
In particular, the MNIST (), USPS (), and SVHN () datasets have been used as a benchmark for domain adaptation. These datasets all are 10 class digit classification datasets, where MNIST and USPS are collections of handwritten digits and SVHN is a collection of realworld digit images. These three datasets can define six domain adaptation problems (e.g., adapt →, adapt →, adapt →, adapt →, adapt →, and adapt →). Following related work, for the case of → and →, some experiments involving embodiments of the present invention used 2,000 randomly selected images from MNIST and 1,800 images from USPS. In the remaining cases, full datasets were used in the experiments discussed below. In these experiments, the images of the datasets were scaled to 32×32 pixels, with an additional step to grayscale for the SVHN dataset ().
In some embodiments of the present invention, data augmentation is used to create additional training data by applying reasonable transformations to input data in an effort to improve generalization (see, e.g., Simard, P. Y.; Steinkraus, D.; and Platt, J. C. 2003. “Best practices for convolutional neural networks applied to visual document analysis.” In Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, 958963). In some embodiments of the present invention, these transformations include geometric transformations and noise, and there geometric transformations may include translation, rotation, skew, zoom, Gaussian noise, Binomial noise, and inverted pixels. As shown in, e.g., Ghifary, M.; Kleijn, W. B.; Zhang, M.; Balduzzi, D.; and Li, W. 2016. Deep reconstructionclassification networks for unsupervised domain adaptation. In European Conference on Computer Vision, 597613. Springer, when these transformations are applied to appropriate inputs, they greatly improve performance.
In unsupervised domain adaptation problems, there is an assumed domain shift between the source (or origin) and target domains. When the input samples are images, the visual nature of the samples allows for an intuitive understanding as to which transformations cause the domain shift and thereby allow augmentation of the source (or origin) domain data to reduce that shift before training, creating an easier optimization problem. For example, many images in the SVHN dataset contain rotated, skewed, or slightly shifted digits. Additionally, many digits are blurry and unfocused. Intuitively, if we are to transfer knowledge from the MNIST dataset, which has resolved, aligned digits, the MNISTSVHN domain shift can be reduced by augmenting the source (or origin) training data with rotated, skewed, shifted, and noisy versions of the original MNIST training images.
Accordingly, aspects of embodiments of the present invention relate to systems and methods for adapting a model trained on a source (or origin) domain _{S }to function in another, related target domain _{T }using a relatively small number of samples from the target domain. Some aspects of embodiments of the present invention relate to the use of a slicedWasserstein distance for adapting the model trained on the source (or origin) domain data _{S}. In some embodiments, the few samples from the target domain _{T }are labeled. In some embodiments, when the few samples from the target domain _{T }are unlabeled, pseudolabels are to be calculated for the unlabeled target domain samples in order to perform the adaptation.
Computing Systems
An exemplary computer system 1200 in accordance with an embodiment is shown in
The exemplary computer system 1200 may include an address/data bus 1210 that is configured to communicate information. Additionally, one or more data processing units, such as a processor 1220, are coupled with the address/data bus 1210. The processor 1220 is configured to process information and instructions. In an embodiment, the processor 1220 is a microprocessor. Alternatively, the processor 1220 may be a different type of processor, such as a parallel processor or a field programmable gate array.
The exemplary computer system 1200 is configured to utilize one or more data storage units. The exemplary computer system 1200 may include a volatile memory unit 1230 (e.g., random access memory (“RAM”), static RAM, dynamic RAM, etc.) coupled with the address/data bus 1210, wherein the volatile memory unit 1230 is configured to store information and instructions for the processor 1220. The exemplary computer system 1200 further may include a nonvolatile memory unit 1240 (e.g., readonly memory (“ROM”), programmable ROM (“PROM”), erasable programmable ROM (“EPROM”), electrically erasable programmable ROM (“EEPROM”), flash memory, etc.) coupled with the address/data bus 1210, wherein the nonvolatile memory unit 1240 is configured to store static information and instructions for the processor 1220. Alternatively, the exemplary computer system 1200 may execute instructions retrieved from an online data storage unit, such as in “cloud” computing. In an embodiment, the exemplary computer system 1200 also may include one or more interfaces, such as an interface 1250, coupled with the address/data bus 1210. The one or more interfaces are configured to enable the exemplary computer system 1200 to interface with other electronic devices and computer systems. The communication interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems, wireless network adaptors, etc.) communication technology.
In one embodiment, the exemplary computer system 1200 may include an input device 1260 coupled with the address/data bus 1210, wherein the input device 1260 is configured to communicate information and command selections to the processor 1220. In accordance with one embodiment, the input device 1260 is an alphanumeric input device, such as a keyboard, that may include alphanumeric and/or function keys. Alternatively, the input device 1260 may be an input device other than an alphanumeric input device. In an embodiment, the exemplary computer system 1200 may include a cursor control device 1270 coupled with the address/data bus 1210, wherein the cursor control device 1270 is configured to communicate user input information and/or command selections to the processor 1220. In an embodiment, the cursor control device 1270 is implemented utilizing a device such as a mouse, a trackball, a trackpad, an optical tracking device, or a touchscreen. The foregoing notwithstanding, in an embodiment, the cursor control device 1270 is directed and/or activated via input from the input device 1260, such as in response to the use of special keys and key sequence commands associated with the input device 1260. In an alternative embodiment, the cursor control device 1270 is configured to be directed or guided by voice commands.
In an embodiment, the exemplary computer system 1200 further may include one or more optional computer usable data storage devices, such as a storage device 1280, coupled with the address/data bus 1210. The storage device 1280 is configured to store information and/or computer executable instructions. In one embodiment, as shown in
The exemplary computer system 1200 is presented herein as an exemplary computing environment in accordance with an embodiment. However, the exemplary computer system 1200 is not strictly limited to being a computer system. For example, an embodiment provides that the exemplary computer system 1200 represents a type of data processing analysis that may be used in accordance with various embodiments described herein. Moreover, other computing systems may also be implemented. Indeed, the spirit and scope of the present technology is not limited to any single data processing environment. Thus, in an embodiment, one or more operations of various embodiments of the present technology are controlled or implemented utilizing computerexecutable instructions, such as program modules, being executed by a computer. In one exemplary implementation, such program modules include routines, programs, objects, components, and/or data structures that are configured to perform particular tasks or implement particular abstract data types. In addition, an embodiment provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as where tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computerstorage media including memorystorage devices.
While the present invention has been described in connection with certain exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims, and equivalents thereof.