Failure feedback system for enhancing machine learning accuracy by synthetic data generation

US 10,692,019 B2
Filed: 08/12/2019
Issued: 06/23/2020
Est. Priority Date: 07/06/2018
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer-accessible medium having stored thereon computer-executable instructions, wherein, when a computer arrangement executes the instructions, the computer arrangement is configured to perform procedures comprising:

(a) receiving at least one dataset, wherein the at least one dataset includes a plurality of data types;

(b) determining if at least one misclassification is generated during a training of at least one model on the at least one dataset by determining if one of the data types is misclassified;

(c) assigning a classification score to each of the data types after the training of the at least one model;

(d) generating at least one synthetic dataset based on the at least one misclassification;

(e) determining if the at least one misclassification is generated during the training of the at least one model on the at least one synthetic dataset based on the assigned classification score being below a particular threshold; and

(f) iterating procedures (d) and (e) until the at least one misclassification is no longer determined during the training of the at least one model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An exemplary system, method, and computer-accessible medium can include, for example, (a) receiving a dataset(s), (b) determining if a misclassification(s) is generated during a training of a model(s) on the dataset(s), (c) generating a synthetic dataset(s) based on the misclassification(s), and (d) determining if the misclassification(s) is generated during the training of the model(s) on the synthetic dataset(s). The dataset(s) can include a plurality of data types. The misclassification(s) can be determined by determining if one of the data types is misclassified. The dataset(s) can include an identification of each of the data types in the dataset(s).

6 Citations

View as Search Results

14 Claims

1. A non-transitory computer-accessible medium having stored thereon computer-executable instructions, wherein, when a computer arrangement executes the instructions, the computer arrangement is configured to perform procedures comprising:
- (a) receiving at least one dataset, wherein the at least one dataset includes a plurality of data types;
  
  (b) determining if at least one misclassification is generated during a training of at least one model on the at least one dataset by determining if one of the data types is misclassified;
  
  (c) assigning a classification score to each of the data types after the training of the at least one model;
  
  (d) generating at least one synthetic dataset based on the at least one misclassification;
  
  (e) determining if the at least one misclassification is generated during the training of the at least one model on the at least one synthetic dataset based on the assigned classification score being below a particular threshold; and
  
  (f) iterating procedures (d) and (e) until the at least one misclassification is no longer determined during the training of the at least one model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-accessible medium of claim 1, wherein the at least one dataset includes one of (i) only real data, (ii) only synthetic data, or (iii) a combination of real data and synthetic data.
  - 3. The computer-accessible medium of claim 1, wherein the at least one dataset includes an identification of each of the data types in the at least one dataset.
  - 4. The computer-accessible medium of claim 1, wherein the at least one synthetic dataset includes more data samples of a selected one of the data types than the at least one dataset, wherein the selected one of the data types is determined based on the at least one misclassification.
  - 5. The computer-accessible medium of claim 1, wherein the at least one model is a machine learning procedure.
  - 6. The computer-accessible medium of claim 5, wherein the machine learning procedure is a supervised machine learning procedure.
  - 7. The computer-accessible medium of claim 1, wherein the computer arrangement is configured to generate the at least one synthetic dataset after a particular number of misclassifications has been determined.
  - 8. The computer-accessible medium of claim 1, wherein the at least one synthetic dataset includes non-misclassified data from the at least one dataset.
  - 9. The computer-accessible medium of claim 1, wherein the computer arrangement is configured to generate the at least one synthetic dataset after a statistical significance has been achieved based on the at least one misclassification.

10. A non-transitory computer-accessible medium having stored thereon computer-executable instructions, wherein, when a computer arrangement executes the instructions, the computer arrangement is configured to perform procedures comprising:
- (a) receiving at least one dataset including an identification of a plurality of data types in the at least one dataset;
  
  (b) determining if at least one misclassification of at least one particular data type of the data types is generated during a training of at least one model on the at least one dataset;
  
  (c) assign a classification score to each of the data types;
  
  (d) generating at least one synthetic dataset based on the misclassified at least one particular data type, wherein the at least one synthetic dataset includes more of the at least one particular data type than the at least one dataset;
  
  (e) determining if the at least one misclassification is generated during the training of the at least one model on the at least one synthetic dataset based on the assigned classification score being below a particular threshold;
  
  (f) iterating procedures (d) and (e) until the at least one misclassification is no longer determined during the training of the at least one model.
- View Dependent Claims (11)
- - 11. The computer-accessible medium of claim 10, wherein the at least one dataset includes one of (i) only real data, (ii) only synthetic data, or (iii) a combination of real data and synthetic data.

12. A method, comprising:
- (a) receiving at least one dataset, wherein the at least one dataset includes a plurality of data types;
  
  (b) determining if at least one misclassification is generated during a training of at least one model on the at least one dataset by determining if one of the data types is misclassified;
  
  (c) assigning a classification score to each of the data types after the training of the at least one model;
  
  (d) sending a request for at least one synthetic dataset based on the misclassification;
  
  (e) receiving the at least one synthetic dataset;
  
  (f) determining if the at least one misclassification is generated during the training of the at least one model on the at least one synthetic dataset based on the assigned classification score being below a particular threshold; and
  
  (g) using a computer hardware arrangement, iterating procedures (d)-(f) until the at least one misclassification is no longer determined during the training of the at least one model.
- View Dependent Claims (13, 14)
- - 13. The method of claim 12, wherein request includes a data request for additional data related to a particular one of the data types.
  - 14. The method of claim 13, wherein the at least one dataset includes one of (i) only real data, (ii) only synthetic data, or (iii) a combination of real data and synthetic data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Capital One Services LLC (Capital One Financial Corporation)
Original Assignee
Capital One Services LLC (Capital One Financial Corporation)
Inventors
Goodsitt, Jeremy, Truong, Anh, Farivar, Reza, Abad, Fardin Abdi Taghi, Watson, Mark, Pham, Vincent, Walters, Austin
Primary Examiner(s)
Chen, Alan

Application Number

US16/537,921
Publication Number

US 20200111019A1
Time in Patent Office

316 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/217   Validation; Performance eva...

G06F 18/24   Classification techniques

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06N 20/20   Ensemble learning

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/047   Probabilistic or stochastic...

G06N 3/082   modifying the architecture,...

G06N 3/088   Non-supervised learning, e....

G06N 5/01   Dynamic search techniques; ...

Failure feedback system for enhancing machine learning accuracy by synthetic data generation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

6 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Failure feedback system for enhancing machine learning accuracy by synthetic data generation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

6 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links