System and method for synthesizing data

US 10,423,890 B1
Filed: 12/11/2014
Issued: 09/24/2019
Est. Priority Date: 12/12/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

(a) identifying from a first set of data comprising a first plurality of data records, each of the data records including multiple fields each of which stores a variable describing an entity, a single data record, at least one of the variables being associated with personal information;

(b) using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record, wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables not equivalent to the variables in the single data record;

(c) processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for each of the records in the first set that describes a comparison of each of the records in the first set to the single data record;

(d) identifying the records associated with each of the scores above a predetermined threshold; and

(e) replacing the data that is a representative of the personal information and is associated with the single data record with data associated with the records identified in step (d) field by field under constraints of maintaining a correlation matrix of the multiple fields to maintain statistical characteristics of the first set of data and remove the personal information; and

(f) building a predictive model based on at least the data associated with the records identified in step (d).

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A single data record is identified from a first set of data. The first set of data comprises a first plurality of data records, each of the data records including multiple items of data describing an entity. Using pattern recognition, the single data record is processed to identify a group of records from within the first set that have corresponding characteristics equivalent to the single data record. A score for each of the records in the first set is determined. The score describes how similar each of the records in the first set is to the single data record.

Citations

9 Claims

1. A computer-implemented method comprising:
- (a) identifying from a first set of data comprising a first plurality of data records, each of the data records including multiple fields each of which stores a variable describing an entity, a single data record, at least one of the variables being associated with personal information;
  
  (b) using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record, wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables not equivalent to the variables in the single data record;
  
  (c) processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for each of the records in the first set that describes a comparison of each of the records in the first set to the single data record;
  
  (d) identifying the records associated with each of the scores above a predetermined threshold; and
  
  (e) replacing the data that is a representative of the personal information and is associated with the single data record with data associated with the records identified in step (d) field by field under constraints of maintaining a correlation matrix of the multiple fields to maintain statistical characteristics of the first set of data and remove the personal information; and
  
  (f) building a predictive model based on at least the data associated with the records identified in step (d).
- View Dependent Claims (2, 3)
- - 2. The computer implemented method of claim 1, further comprising:
    - (f) receiving an original set of data comprising an original plurality of data records, each data record including multiple fields each of which stores a variable describing an entity;
      
      (g) identifying any data record in the original plurality of data records comprising a corresponding variable that is a number of standard deviations from a mean of values for that same variable in the original plurality of data records;
      
      (h) removing from the original set of data all records identified in step (g) to generate a first set of data records comprising a subset of the original plurality of data records.
  - 3. The computer-implemented method of claim 1, further comprising:
    - (i) identifying a second single data record from the first set; and
      
      (j) performing steps (b) through (e) on the second single data record.

4. A system comprising:
- memory operable to store at least one program;
  
  at least one processor communicatively coupled to the memory, in which the at least one program, when executed by the at least one processor, causes the at least one processor to perform a method comprising;
  
  (a) identifying from a first set of data comprising a first plurality of data records, each of the data records including multiple fields each of which stores a variable describing an entity, a single data record, at least one of the variables being associated with personal information;
  
  (b) using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record,wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables not equivalent to the variables in the single data record;
  
  (c) processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for each of the records in the first set that describes a comparison of each of the records in the first set to the single data record;
  
  (d) identifying the records associated with each of the scores above a predetermined threshold; and
  
  (e) replacing the data that is a representative of the personal information and is associated with the single data record with data associated with the records identified in step (d) field by field under constraints of maintaining a correlation matrix of the multiple fields to maintain statistical characteristics of the first set of data and remove the personal information; and
  
  (f) building a predictive model based on at least the data associated with the records identified in step (d).
- View Dependent Claims (5, 6)
- - 5. The system of claim 4, the method further comprising:
    - (f) receiving an original set of data comprising an original plurality of data records, each data record including multiple fields each of which stores a variable describing an entity;
      
      (g) identifying any data record in the original plurality of data records comprising a corresponding variable that is a number of standard deviations from a mean of values for that same variable in the original plurality of data records;
      
      (h) removing from the original set of data all records identified in step (g) to generate a first set of data records comprising a subset of the original plurality of data records.
  - 6. The system of claim 4, the method further comprising:
    - (i) identifying a second single data record from the first set; and
      
      (j) performing steps (b) through (e) on the second single data record.

7. A non-transitory computer readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, perform a method comprising:
- (a) identifying from a first set of data comprising a first plurality of data records, each of the data records including multiple fields each of which stores a variable describing an entity, a single data record, at least one of the variables being associated with personal information;
  
  (b) using pattern recognition, processing the single data record to identify a group of records from within the first set that have corresponding variables equivalent to the variables in the single data record, wherein the identified group of records comprises a target set of variables, the target set of variables comprising variables equivalent to the variables in the single data record and the group of records from the first set that are not identified comprises a control set of variables, the control set of variables comprising variables not equivalent to the variables in the single data record;
  
  (c) processing the target set of variables and the control set of variables, using probability estimation and optimization constraints, to determine a score for each of the records in the first set that describes a comparison of each of the records in the first set to the single data record;
  
  (d) identifying the records associated with each of the scores above a predetermined threshold; and
  
  (e) replacing the data that is a representative of the personal information and is associated with the single data record with data associated with the records identified in step (d) field by field under constraints of maintaining a correlation matrix of the multiple fields to maintain statistical characteristics of the first set of data and remove the personal information; and
  
  (f) building a predictive model based on at least the data associated with the records identified in step (d).
- View Dependent Claims (8, 9)
- - 8. The non-transitory computer readable storage medium of claim 7, the method further comprising:
    - (f) receiving an original set of data comprising an original plurality of data records, each data record including multiple fields each of which stores a variable describing an entity;
      
      (g) identifying any data record in the original plurality of data records comprising a corresponding variable that is a number of standard deviations from a mean of values for that same variable in the original plurality of data records;
      
      (h) removing from the original set of data all records identified in step (g) to generate a first set of data records comprising a subset of the original plurality of data records.
  - 9. The non-transitory computer readable storage medium of claim 7, the method further comprising:
    - (i) identifying a second single data record from the first set; and
      
      (j) performing steps (b) through (e) on the second single data record.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cigna Intellectual Property, Inc. (CIGNA Corporation)
Original Assignee
Cigna Intellectual Property, Inc. (CIGNA Corporation)
Inventors
Fogarty, David, Lin, Jing
Primary Examiner(s)
Chang, Li Wu

Application Number

US14/567,432
Time in Patent Office

1,748 Days
Field of Search

706 12
US Class Current
CPC Class Codes

G06N 20/00 Machine learning

G06N 5/047 Pattern matching networks; ...

System and method for synthesizing data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for synthesizing data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links