DATA CLUSTERING SYSTEM AND METHOD

US 20150134660A1
Filed: 11/14/2013
Published: 05/14/2015
Est. Priority Date: 11/14/2013
Status: Abandoned Application

First Claim

Patent Images

1. A non-transitory computer-readable medium storing program code, the program code executable by a processor of a computing system to cause the computing system to:

identify a first dataset comprising n data samples;

identify b data samples of the n data samples of the first dataset, wherein b is less than n;

create a first plurality of datasets, each of the first plurality of datasets comprising m data samples, where m is greater than b, and wherein each of the m data samples of each of the first plurality of datasets is selected from the b data samples;

identify c data samples of the n data samples of the first dataset, wherein c is less than n, and wherein the c data samples are not identical to the b data samples;

create a second plurality of datasets, each of the second plurality of datasets comprising p data samples, where p is greater than c, and wherein each of the p data samples of each of the second plurality of datasets is selected from the c data samples;

for each of the b data samples, identify a cluster based on the first plurality of datasets; and

for each of the c data samples, identify a cluster based on the second plurality of datasets.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system includes identification of a first dataset comprising n data samples, identification of b data samples of the n data samples of the first dataset, wherein b is less than n, creation of a first plurality of datasets, each of the first plurality of datasets comprising m data samples, where m is greater than b, and wherein each of the m data samples of each of the first plurality of datasets is selected from the b data samples, identification of c data samples of the n data samples of the first dataset, wherein c is less than n, and wherein the c data samples are not identical to the b data samples, creation of a second plurality of datasets, each of the second plurality of datasets comprising p data samples, where p is greater than c, and wherein each of the p data samples of each of the second plurality of datasets is selected from the c data samples, identification, for each of the b data samples, of a cluster based on the first plurality of datasets, and identification, for each of the c data samples, of a cluster based on the second plurality of datasets.

19 Citations

View as Search Results

18 Claims

1. A non-transitory computer-readable medium storing program code, the program code executable by a processor of a computing system to cause the computing system to:
- identify a first dataset comprising n data samples;
  
  identify b data samples of the n data samples of the first dataset, wherein b is less than n;
  
  create a first plurality of datasets, each of the first plurality of datasets comprising m data samples, where m is greater than b, and wherein each of the m data samples of each of the first plurality of datasets is selected from the b data samples;
  
  identify c data samples of the n data samples of the first dataset, wherein c is less than n, and wherein the c data samples are not identical to the b data samples;
  
  create a second plurality of datasets, each of the second plurality of datasets comprising p data samples, where p is greater than c, and wherein each of the p data samples of each of the second plurality of datasets is selected from the c data samples;
  
  for each of the b data samples, identify a cluster based on the first plurality of datasets; and
  
  for each of the c data samples, identify a cluster based on the second plurality of datasets.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A non-transitory computer-readable medium storing program code according to claim 1, wherein identification of a cluster for each of the b data samples based on the first plurality of datasets comprises:
    - identification of a cluster of each of the m data samples of a first one of the first plurality of datasets; and
      
      identification of a cluster of each of the m data samples of a second one of the first plurality of datasets.
  - 3. A non-transitory computer-readable medium storing program code according to claim 2, wherein identification of a cluster of each of the m data samples of the first one of the first plurality of datasets comprises:
    - for each unique data sample of the first one of the first plurality of datasets, determination of a first number of occurrences of the unique data sample in the first one of the first plurality of datasets; and
      
      identification of a cluster of each of the m data samples of the first one of the first plurality of datasets based on the unique data samples of the first one of the first plurality of datasets and the first numbers of occurrences, andwherein identification of a cluster of each of the m data samples of the second one of the first plurality of datasets comprises;
      
      for each unique data sample of the second one of the first plurality of datasets, determination of a second number of occurrences of the unique data sample in the second one of the first plurality of datasets; and
      
      identification of a cluster of each of the m data samples of the second one of the first plurality of datasets based on the unique data samples of the second one of the first plurality of datasets and the second numbers of occurrences.
  - 4. A non-transitory computer-readable medium storing program code according to claim 1, wherein identification of a cluster for each of the b data samples comprises:
    - for each unique data sample of the first one of the first plurality of datasets, determination of a first number of occurrences of the unique data sample in the first one of the first plurality of datasets;
      
      for each unique data sample of the second one of the first plurality of datasets, determination of a second number of occurrences of the unique data sample in the second one of the first plurality of datasets; and
      
      identification of a cluster for each of the b data samples based on the unique data samples of the first one of the first plurality of datasets, the first numbers of occurrences, the unique data samples of the second one of the first plurality of datasets, and the second numbers of occurrences.
  - 5. A non-transitory computer-readable medium storing program code according to claim 1, wherein each of the m data samples of each of the first plurality of datasets is randomly selected from the b data samples, andwherein each of the m data samples of each of the second plurality of datasets is randomly selected from the c data samples.
  - 6. A non-transitory computer-readable medium storing program code according to claim 1, wherein b is equal to c and wherein m is equal to p.

7. A computing system comprising:
- a memory storing processor-executable program code; and
  
  a processor to execute the processor-executable program code in order to cause the computing system to;
  
  identify a first dataset comprising n data samples;
  
  identify b data samples of the n data samples of the first dataset, wherein b is less than n;
  
  create a first plurality of datasets, each of the first plurality of datasets comprising m data samples, where m is greater than b, and wherein each of the m data samples of each of the first plurality of datasets is selected from the b data samples;
  
  identify c data samples of the n data samples of the first dataset, wherein c is less than n, and wherein the c data samples are not identical to the b data samples;
  
  create a second plurality of datasets, each of the second plurality of datasets comprising p data samples, where p is greater than c, and wherein each of the p data samples of each of the second plurality of datasets is selected from the c data samples;
  
  for each of the b data samples, identify a cluster based on the first plurality of datasets; and
  
  for each of the c data samples, identify a cluster based on the second plurality of datasets.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. A computing system according to claim 7, wherein identification of a cluster for each of the b data samples based on the first plurality of datasets comprises:
    - identification of a cluster of each of the m data samples of a first one of the first plurality of datasets; and
      
      identification of a cluster of each of the m data samples of a second one of the first plurality of datasets.
  - 9. A computing system according to claim 8, wherein identification of a cluster of each of the m data samples of the first one of the first plurality of datasets comprises:
    - for each unique data sample of the first one of the first plurality of datasets, determination of a first number of occurrences of the unique data sample in the first one of the first plurality of datasets; and
      
      identification of a cluster of each of the m data samples of the first one of the first plurality of datasets based on the unique data samples of the first one of the first plurality of datasets and the first numbers of occurrences, andwherein identification of a cluster of each of the m data samples of the second one of the first plurality of datasets comprises;
      
      for each unique data sample of the second one of the first plurality of datasets, determination of a second number of occurrences of the unique data sample in the second one of the first plurality of datasets; and
      
      identification of a cluster of each of the m data samples of the second one of the first plurality of datasets based on the unique data samples of the second one of the first plurality of datasets and the second numbers of occurrences.
  - 10. A computing system according to claim 7, wherein identification of a cluster for each of the b data samples comprises:
    - for each unique data sample of the first one of the first plurality of datasets, determination of a first number of occurrences of the unique data sample in the first one of the first plurality of datasets;
      
      for each unique data sample of the second one of the first plurality of datasets, determination of a second number of occurrences of the unique data sample in the second one of the first plurality of datasets; and
      
      identification of a cluster for each of the b data samples based on the unique data samples of the first one of the first plurality of datasets, the first numbers of occurrences, the unique data samples of the second one of the first plurality of datasets, and the second numbers of occurrences.
  - 11. A computing system according to claim 7, wherein each of the m data samples of each of the first plurality of datasets is randomly selected from the b data samples, andwherein each of the m data samples of each of the second plurality of datasets is randomly selected from the c data samples.
  - 12. A computing system according to claim 7, wherein b is equal to c and wherein m is equal to p.

13. A computer-implemented method, comprising:
- identifying a first dataset comprising n data samples;
  
  identifying b data samples of the n data samples of the first dataset, wherein b is less than n;
  
  creating a first plurality of datasets, each of the first plurality of datasets comprising m data samples, where m is greater than b, and wherein each of the m data samples of each of the first plurality of datasets is selected from the b data samples;
  
  identifying c data samples of the n data samples of the first dataset, wherein c is less than n, and wherein the c data samples are not identical to the b data samples;
  
  creating a second plurality of datasets, each of the second plurality of datasets comprising p data samples, where p is greater than c, and wherein each of the p data samples of each of the second plurality of datasets is selected from the c data samples;
  
  for each of the b data samples, identifying a cluster based on the first plurality of datasets; and
  
  for each of the c data samples, identifying a cluster based on the second plurality of datasets.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. A computer-implemented method according to claim 13, wherein identifying a cluster for each of the b data samples based on the first plurality of datasets comprises:
    - identifying a cluster of each of the m data samples of a first one of the first plurality of datasets; and
      
      identifying a cluster of each of the m data samples of a second one of the first plurality of datasets.
  - 15. A computer-implemented method according to claim 14, wherein identifying a cluster of each of the m data samples of the first one of the first plurality of datasets comprises:
    - for each unique data sample of the first one of the first plurality of datasets, determining a first number of occurrences of the unique data sample in the first one of the first plurality of datasets; and
      
      identifying a cluster of each of the m data samples of the first one of the first plurality of datasets based on the unique data samples of the first one of the first plurality of datasets and the first numbers of occurrences, andwherein identifying a cluster of each of the m data samples of the second one of the first plurality of datasets comprises;
      
      for each unique data sample of the second one of the first plurality of datasets, determining a second number of occurrences of the unique data sample in the second one of the first plurality of datasets; and
      
      identifying a cluster of each of the m data samples of the second one of the first plurality of datasets based on the unique data samples of the second one of the first plurality of datasets and the second numbers of occurrences.
  - 16. A computer-implemented method according to claim 13, wherein identifying a cluster for each of the b data samples comprises:
    - for each unique data sample of the first one of the first plurality of datasets, determining a first number of occurrences of the unique data sample in the first one of the first plurality of datasets;
      
      for each unique data sample of the second one of the first plurality of datasets, determining a second number of occurrences of the unique data sample in the second one of the first plurality of datasets; and
      
      identifying a cluster for each of the b data samples based on the unique data samples of the first one of the first plurality of datasets, the first numbers of occurrences, the unique data samples of the second one of the first plurality of datasets, and the second numbers of occurrences.
  - 17. A computer-implemented method according to claim 13, wherein each of the m data samples of each of the first plurality of datasets is randomly selected from the b data samples, andwherein each of the m data samples of each of the second plurality of datasets is randomly selected from the c data samples.
  - 18. A computer-implemented method according to claim 13, wherein b is equal to c and wherein m is equal to p.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
General Electric Company
Original Assignee
General Electric Company
Inventors
Yan, Weizhong, Gilder, Mark Richard, Brahmakshatriya, Umang Gopalbhai

Application Number

US14/080,096
Publication Number

US 20150134660A1
Time in Patent Office

Days
Field of Search
US Class Current

707/737
CPC Class Codes

G06F 16/285 Clustering or classification

DATA CLUSTERING SYSTEM AND METHOD

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

DATA CLUSTERING SYSTEM AND METHOD

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links