Modular memoization, tracking and train-data management of feature extraction

US 10,572,822 B2
Filed: 07/21/2016
Issued: 02/25/2020
Est. Priority Date: 07/21/2016
Status: Active Grant

First Claim

Patent Images

1. A method comprising using at least one hardware processor for, in a feature extraction step of a machine learning analysis:

receiving at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data;

for at least some of said feature extractors;

i) determining extractor defining data, the extractor defining data comprising;

extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;

an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;

ii) computing a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—

the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;

iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;

computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;

iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;

retrieving said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;

v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;

deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; and

sending at least some of said node features or new node features as an output set of said dependency graph, thereby accelerating the feature extraction step of the machine learning analysis.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided, in accordance with some embodiments, a method for receiving electronic documents representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors. For at least some feature extractors, extractor defining data, comprising extractor data and computational dependencies of the graph node in the dependency graph are determined, and a node lookup key based on the extractor defining data is computed. When the node lookup key is associated with a stored set of output feature values, the stored set is assigned as output values of the feature extractor. When node lookup key is not associated with a stored set of output feature values, a new set of output feature values is computed, stored, and associated the node lookup key. The one set of output feature values are sent as an output feature set.

14 Citations

3 Claims

1. A method comprising using at least one hardware processor for, in a feature extraction step of a machine learning analysis:
- receiving at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data;
  
  for at least some of said feature extractors;
  
  i) determining extractor defining data, the extractor defining data comprising;
  
  extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
  
  an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;
  
  ii) computing a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
  
  the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;
  
  iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
  
  computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;
  
  iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
  
  retrieving said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;
  
  v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
  
  deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; and
  
  sending at least some of said node features or new node features as an output set of said dependency graph, thereby accelerating the feature extraction step of the machine learning analysis.

2. A computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code being executable by at least one hardware processor to, in a feature extraction step of a machine learning analysis:
- receive at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data;
  
  for at least some of said feature extractors;
  
  i) determine extractor defining data, the extractor defining data comprising extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
  
  an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;
  
  ii) compute a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
  
  the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;
  
  iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
  
  computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;
  
  iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
  
  retrieve said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;
  
  v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
  
  deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; and
  
  send at least some of said node features or new node features as an output set of said dependency graph, thereby accelerating the feature extraction step of the machine learning analysis.

3. A computerized system, comprising:
- (a) a non-transitory computer-readable storage medium having stored thereon program code for, in a feature extraction step of a machine learning analysis;
  
  receiving at least one electronic document representing a dependency graph comprising feature extractors at each graph node and directed edges corresponding to computational dependencies of the feature extractors, wherein each of the feature extractors is configured to reduce data to be informative and non-redundant, by deriving vector values or matrix values from the data;
  
  for at least some of said feature extractors;
  
  (i) determining extractor defining data, the extractor defining data comprising extractor data and computational dependencies of said graph node in said dependency graph, wherein the extractor data are selected from the group consisting of;
  
  an extractor class, an extractor parameter, a cohort class, a cohort parameter, and a cohort index list;
  
  (ii) computing a node lookup key based on said extractor defining data, wherein, when the feature extractor is context-insensitive—
  
  the node lookup key is for an entire cohort, wherein a context-insensitive feature extractor is a feature extractor which computes an identical value for a same sample in the sub-cohort and in the entire cohort;
  
  (iii) when the node lookup key is not associated with node features that are stored on a non-transitory computer-readable storage medium;
  
  computing new node features, storing said new node features on said non-transitory computer-readable storage medium, and associating said node lookup key with said new node features;
  
  (iv) when the feature extractor is context-sensitive, or when the feature extraction step is directed to the entire cohort;
  
  retrieving said node features or said new node features from said non-transitory computer-readable storage medium, wherein a context-sensitive feature extractor is a feature extractor which computes a different value for a same sample in the sub-cohort and in the entire cohort;
  
  (v) when the feature extractor is context-insensitive and the feature extraction step is directed to a sub-cohort;
  
  deriving a feature of the sub-cohort from the node features or the new node features of the entire cohort without recomputing the feature for the sub-cohort; and
  
  sending at least some of said node features or new node features as an output set of said dependency graph, and(b) at least one hardware processor configured to execute said program code, thereby accelerating the feature extraction step of the machine learning analysis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Aharonov, Ranit, Goldschmidt, Yaara, Ozery-Flato, Michal, Yanover, Chen
Primary Examiner(s)
Fan, Shiow-Jy

Application Number

US15/215,588
Publication Number

US 20180025092A1
Time in Patent Office

1,314 Days
Field of Search

707999101, 707E17127, 707E17131, 707999001, 7079991, 707999102, 707744, 706 62, 706 15, 706 25, 706 45, 706 46, 706 47, 706 56, 706925
US Class Current
CPC Class Codes

G06F 16/81   Indexing, e.g. XML tags; Da...

G06F 16/9024   Graphs; Linked lists G06F16...

G06N 20/00   Machine learning

G06N 5/01   Dynamic search techniques; ...

Modular memoization, tracking and train-data management of feature extraction

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

14 Citations

3 Claims

Specification

Use Cases

Quick Links

Others

Modular memoization, tracking and train-data management of feature extraction

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

3 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others