Modular memoization, tracking and train-data management of feature extraction
First Claim
1. A method comprising using at least one hardware processor for, in a feature extraction step of a machine learning analysis:
- receiving at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data;
for at least some of said feature extractors;
i) determining extractor defining data, the extractor defining data comprising;
extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;
ii) computing a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;
iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;
iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
retrieving said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;
v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; and
sending at least some of said node features or new node features as an output set of said dependency graph, thereby accelerating the feature extraction step of the machine learning analysis.
1 Assignment
0 Petitions
Accused Products
Abstract
There is provided, in accordance with some embodiments, a method for receiving electronic documents representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors. For at least some feature extractors, extractor defining data, comprising extractor data and computational dependencies of the graph node in the dependency graph are determined, and a node lookup key based on the extractor defining data is computed. When the node lookup key is associated with a stored set of output feature values, the stored set is assigned as output values of the feature extractor. When node lookup key is not associated with a stored set of output feature values, a new set of output feature values is computed, stored, and associated the node lookup key. The one set of output feature values are sent as an output feature set.
14 Citations
3 Claims
-
1. A method comprising using at least one hardware processor for, in a feature extraction step of a machine learning analysis:
-
receiving at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data; for at least some of said feature extractors; i) determining extractor defining data, the extractor defining data comprising;
extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;ii) computing a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
retrieving said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; andsending at least some of said node features or new node features as an output set of said dependency graph, thereby accelerating the feature extraction step of the machine learning analysis.
-
-
2. A computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code being executable by at least one hardware processor to, in a feature extraction step of a machine learning analysis:
-
receive at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data; for at least some of said feature extractors; i) determine extractor defining data, the extractor defining data comprising extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;ii) compute a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
retrieve said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; andsend at least some of said node features or new node features as an output set of said dependency graph, thereby accelerating the feature extraction step of the machine learning analysis.
-
-
3. A computerized system, comprising:
-
(a) a non-transitory computer-readable storage medium having stored thereon program code for, in a feature extraction step of a machine learning analysis; receiving at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data; for at least some of said feature extractors; (i) determining extractor defining data, the extractor defining data comprising extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;(ii) computing a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;(iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;(iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
retrieving said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;(v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; andsending at least some of said node features or new node features as an output set of said dependency graph, and (b) at least one hardware processor configured to execute said program code, thereby accelerating the feature extraction step of the machine learning analysis.
-
Specification