Data mining technique with maintenance of ancestry counts

US 10,268,953 B1
Filed: 01/13/2015
Issued: 04/23/2019
Est. Priority Date: 01/28/2014
Status: Active Grant

First Claim

Patent Images

1. A data mining system, for use with a data mining training database containing a plurality of data samples, comprising:

a computer system having a memory having a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, anda gene pool processor which;

performs a procreation step of forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;

a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;

tests each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;

calculates an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples; and

stores, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual,the gene pool processor further including a competition module which (i) adjusts respective overall fitness estimates of the individuals in dependence upon their respective ancestry counts and (ii) selects individuals for discarding in dependence upon comparisons among their respective overall fitness estimates,the computer system further having a gene harvesting module providing for deployment selected ones of the remaining individuals from the pool of candidate individuals,wherein the computer system comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;

the server delegates to the at least one client device the testing of the individuals in the testing subset; and

the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Roughly described, a computer-implemented evolutionary data mining system includes a memory storing a candidate gene database in which each candidate individual has a respective fitness estimate; a gene pool processor which tests individuals from the candidate gene pool on training data and updates the fitness estimate associated with the individuals in dependence upon the tests; and a gene harvesting module for deploying selected individuals from the gene pool, wherein the gene pool processor includes a competition module which selects individuals for discarding in dependence upon their updated fitness estimate. The system maintains the ancestry count for each of the candidate individuals, and may use this information to adjust the competition among the individuals, to adjust the selection of individuals for further procreation, and/or for other purposes.

Citations

28 Claims

1. A data mining system, for use with a data mining training database containing a plurality of data samples, comprising:
- a computer system having a memory having a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, anda gene pool processor which;
  
  performs a procreation step of forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  tests each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculates an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples; and
  
  stores, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual,the gene pool processor further including a competition module which (i) adjusts respective overall fitness estimates of the individuals in dependence upon their respective ancestry counts and (ii) selects individuals for discarding in dependence upon comparisons among their respective overall fitness estimates,the computer system further having a gene harvesting module providing for deployment selected ones of the remaining individuals from the pool of candidate individuals,wherein the computer system comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the competition module groups individuals into one or more of a plurality of testing experience level groups in dependence upon their testing experience levels, and selects individuals for discarding further in dependence upon their respective testing experience level groups.
  - 3. The system of claim 1, wherein the competition module adjusts the respective overall fitness estimates in dependence upon their respective ancestry counts by handicapping their respective overall fitness estimates in dependence upon their respective ancestry counts.
  - 4. The system of claim 3, wherein the handicap applied to each given one of the individuals by the competition module varies non-decreasingly as a function of the ancestry count of the given individual.
  - 5. The system of claim 1, wherein in the procreation step the gene pool processor randomly selects the parent individuals for the subject new individual using a random selection weighted in dependence upon individuals'"'"' ancestry counts.
  - 6. The system of claim 1, wherein in providing for deployment of selected ones of the remaining individuals from the pool of candidate individuals, the gene harvesting module provides for deployment individuals from the pool of candidate individuals selected in dependence upon comparisons among their respective ancestry counts.
  - 7. The system of claim 1, wherein in the testing of each individual in the testing subset of the at least one of the candidate individuals:
    - the server delegates to the at least one client device the testing of the individuals in the testing subset, including indicating, to each client device receiving an individual for testing, the individual'"'"'s ancestry count; and
      
      the server receives tested individuals from the at least one client device.
  - 8. The system of claim 7, wherein in receiving tested individuals from the at least one client device, at least one of the tested individuals is received in association with an indication of its ancestry count.
  - 9. The system of claim 1, wherein the computer system further comprises a module which provides an API for external retrieval of the ancestry counts.

10. A client computer system for a data mining system, for use with a data mining training database containing a plurality of data samples, comprising:
- a processing subsystem,a memory having a candidate gene database identifying a client-centric pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, anda client gene pool processor which;
  
  performs a procreation step of forming new individuals in the client-centric pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  tests each individual in a testing subset of at least one of the candidate individuals in the client-centric pool of candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculates a client-centric overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples; and
  
  stores, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual,the client computer system further including a competition module which (i) adjusts respective client-centric overall fitness estimates of the individuals in dependence upon their respective ancestry counts and (ii) selects individuals for discarding in dependence upon comparisons among their respective client-centric overall fitness estimates; and
  
  the client computer system further including a gene harvesting module which forwards to a central server infrastructure for potential deployment or further testing, selected ones of the remaining individuals from the client-centric pool of candidate individuals,wherein the data mining system comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10, wherein the competition module groups individuals into one or more of a plurality of testing experience level groups in dependence upon their testing experience levels, and selects individuals for discarding further in dependence upon their respective testing experience level groups.
  - 12. The system of claim 10, wherein the competition module adjusts the respective client-centric overall fitness estimates in dependence upon their respective ancestry counts by handicapping their respective client-centric overall fitness estimates in dependence upon their respective ancestry counts.
  - 13. The system of claim 10, wherein in the procreation step the gene pool processor randomly selects the parent individuals for the subject new individual using a random selection weighted to favor individuals having lower ancestry count over individuals having higher ancestry count.
  - 14. The system of claim 10, wherein in forwarding selected individuals to a central server infrastructure for potential deployment or further testing, the gene harvesting module selects individuals in dependence upon comparisons among their respective ancestry counts.

15. A computer implemented data mining method, for use with a data mining training database containing a plurality of data samples,and for use further with a memory having a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the method comprising:
- performing a procreation step of forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculating an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  adjusting respective overall fitness estimates of the individuals in dependence upon their respective ancestry counts and selecting individuals for discarding in dependence upon comparisons among their respective overall fitness estimates;
  
  harvesting for deployment selected ones of the remaining individuals from the pool of candidate individuals;
  
  delegating, by a server and to at least one client device, testing of individuals in a testing subset of at least one of the candidate individuals; and
  
  receiving, by the server, tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

16. A data mining method implemented on a client computer system in a client/server environment, for use with a data mining training database containing a plurality of data samples,for use further with a memory having a candidate gene database identifying a client-centric pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the method comprising:
- performing a procreation step of forming new individuals in the client-centric pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing each individual in a testing subset of at least one of the candidate individuals in the client-centric pool of candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculating a client-centric overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  adjusting respective client-centric overall fitness estimates of the individuals in dependence upon their respective ancestry counts and selecting individuals for discarding in dependence upon comparisons among their respective client-centric overall fitness estimates;
  
  forwarding to a central server infrastructure for potential deployment or further testing, selected ones of the remaining individuals from the client-centric pool of candidate individuals;
  
  delegating, by a server and to at least one client device, testing of individuals in a testing subset of at least one of the candidate individuals; and
  
  receiving, by the server, tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

17. A computer readable medium, for use with a data mining training database containing a plurality of data samples, and for use further with a memory having a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the medium having stored thereon in a non-transitory manner a plurality of code portions which, when executed by a computer system performs data mining steps of:
- performing a procreation step of forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculating an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  adjusting respective overall fitness estimates of the individuals in dependence upon their respective ancestry counts and selecting individuals for discarding in dependence upon comparisons among their respective overall fitness estimates;
  
  harvesting for deployment selected ones of the remaining individuals from the pool of candidate individuals;
  
  delegating, by a server and to at least one client device, testing of individuals in a testing subset of at least one of the candidate individuals; and
  
  receiving, by the server, tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

18. A computer readable medium, for use in a client/server environment with a data mining training database containing a plurality of data samples, and for use further with a memory having a candidate gene database identifying a client-centric pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the medium having stored thereon in a non-transitory manner a plurality of code portions which, when executed by a client computer system in the client/server environment, performs data mining steps of:
- performing a procreation step of forming new individuals in the client-centric pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing each individual in a testing subset of at least one of the candidate individuals in the client-centric pool of candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculating a client-centric overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  adjusting respective client-centric overall fitness estimates of the individuals in dependence upon their respective ancestry counts and selecting individuals for discarding in dependence upon comparisons among their respective client-centric overall fitness estimates;
  
  forwarding to a central server infrastructure for potential deployment or further testing, selected ones of the remaining individuals from the client-centric pool of candidate individuals;
  
  delegating, by a server and to at least one client device, testing of individuals in a testing subset of at least one of the candidate individuals; and
  
  receiving, by the server, tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

19. A data mining system, for use with a data mining training database containing a plurality of data samples, comprising:
- memory means for storing a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions,procreation means for forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing means for testing each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  evaluating means for calculating an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing means for storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  discarding means for (i) adjusting respective overall fitness estimates of the individuals in dependence upon their respective ancestry counts and (ii) selecting individuals for discarding in dependence upon comparisons among their respective overall fitness estimates; and
  
  harvesting means for providing for deployment selected ones of the remaining individuals from the pool of candidate individuals,wherein the data mining system comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

20. A client computer system for use in a client/server data mining environment, for use with a data mining training database containing a plurality of data samples, comprising:
- memory means for storing a candidate gene database identifying a client-centric pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the client computer system comprising;
  
  procreation means for forming new individuals in the client-centric pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing means for testing each individual in a testing subset of at least one of the candidate individuals in the client-centric pool of candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  evaluating means for calculating a client-centric overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing means for storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  discarding means for (i) adjusting respective client-centric overall fitness estimates of the individuals in dependence upon their respective ancestry counts and (ii) selecting individuals for discarding in dependence upon comparisons among their respective client-centric overall fitness estimates; and
  
  harvesting means for forwarding to a central server infrastructure for potential deployment or further testing, selected ones of the remaining individuals from the client-centric pool of candidate individuals,wherein the client/server data mining environment comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.

21. A data mining system, for use with a data mining training database containing a plurality of data samples, comprising:
- a computer system having a memory having a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, anda gene pool processor which;
  
  performs a procreation step of forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  tests each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculates an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples; and
  
  stores, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual,the gene pool processor further including a competition module which selects individuals for discarding in dependence upon comparisons among their respective overall fitness estimates,the computer system further having a gene harvesting module providing for deployment selected ones of the remaining individuals from the pool of candidate individuals,wherein, in the procreation step, the gene pool processor randomly selects the parent individuals for the subject new individual using a random selection weighted in dependence upon individuals'"'"' ancestry counts,wherein the computer system comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.
- View Dependent Claims (22)
- - 22. The system of claim 21, wherein the competition module:
    - selects individuals for discarding further in dependence upon their respective ancestry counts; and
      
      when selecting individuals for discarding, handicaps their respective overall fitness estimates in dependence upon their respective ancestry counts.

23. A client computer system for a data mining system, for use with a data mining training database containing a plurality of data samples, comprising:
- a processing subsystem,a memory having a candidate gene database identifying a client-centric pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, anda client gene pool processor which;
  
  performs a procreation step of forming new individuals in the client-centric pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  tests each individual in a testing subset of at least one of the candidate individuals in the client-centric pool of candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculates a client-centric overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples; and
  
  stores, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual,the client computer system further including a competition module which selects individuals for discarding in dependence upon comparisons among their respective client-centric overall fitness estimates; and
  
  the client computer system further including a gene harvesting module which forwards to a central server infrastructure for potential deployment or further testing, selected ones of the remaining individuals from the client-centric pool of candidate individuals,wherein, in the procreation step, the client gene pool processor randomly selects the parent individuals for the subject new individual using a random selection weighted in dependence upon individuals'"'"' ancestry counts,wherein the data mining system comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.
- View Dependent Claims (24)
- - 24. The system of claim 23, wherein the competition module:
    - selects individuals for discarding further in dependence upon their respective ancestry counts; and
      
      when selecting individuals for discarding, handicaps their respective client-centric overall fitness estimates in dependence upon their respective ancestry counts.

25. A computer implemented data mining method, for use with a data mining training database containing a plurality of data samples,and for use further with a memory having a candidate gene database identifying a pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the method comprising:
- performing a procreation step of forming new individuals in the pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing each individual in a testing subset of at least one of the candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculating an overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  selecting individuals for discarding in dependence upon comparisons among their respective overall fitness estimates; and
  
  harvesting for deployment selected ones of the remaining individuals from the pool of candidate individuals,wherein the procreation step includes randomly selecting the parent individuals for the subject new individual using a random selection weighted in dependence upon individuals'"'"' ancestry counts,wherein the method further includes delegating, by a server and to at least one client device, testing of individuals in a testing subset of at least one of the candidate individuals, andwherein the method further includes receiving, by the server, tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.
- View Dependent Claims (26)
- - 26. The computer implemented data mining method of claim 25, wherein the selecting of individuals for discarding selects individuals for discarding further in dependence upon their respective ancestry counts, such that when selecting individuals for discarding, their respective overall fitness estimates are handicapped in dependence upon their respective ancestry counts.

27. A data mining method implemented on a client computer system in a client/server environment, for use with a data mining training database containing a plurality of data samples,for use further with a memory having a candidate gene database identifying a client-centric pool of candidate individuals, each of the candidate individuals identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, the method comprising:
- performing a procreation step of forming new individuals in the client-centric pool of candidate individuals at least in part by copying into each subject new individual at least one member of the group consisting of;
  
  a condition in an individual in a set of at least one parent individual corresponding to the subject new individual, and an output in an individual in the set of parent individuals corresponding to the subject new individual;
  
  testing each individual in a testing subset of at least one of the candidate individuals in the client-centric pool of candidate individuals, each of the tests applying the conditions of the respective individual to a respective subset of the data samples in the training database to propose a result, each individual in the testing subset being tested on at least one data sample and at least one of the individuals in the testing subset being tested on more than one data sample;
  
  calculating a client-centric overall fitness estimate for each of the individuals in the testing subset, in dependence upon the results proposed by the respective individual when the conditions of the respective individual were applied to the respective subset of the data samples;
  
  storing, in association with each of the candidate individuals in the testing subset, a respective ancestry count indicating a respective number of procreation events in the ancestry of the individual;
  
  selecting individuals for discarding in dependence upon comparisons among their respective client-centric overall fitness estimates; and
  
  forwarding to a central server infrastructure for potential deployment or further testing, selected ones of the remaining individuals from the client-centric pool of candidate individuals,wherein the procreation step further includes randomly selecting the parent individuals for the subject new individual using a random selection weighted in dependence upon individuals'"'"' ancestry counts,wherein the client/server environment comprises a server and a collection of at least one client device, andwherein in testing each individual in a testing subset of at least one of the candidate individuals;
  
  the server delegates to the at least one client device the testing of the individuals in the testing subset; and
  
  the server receives tested individuals from the at least one client device, a first subset of at least one of the received tested individuals being different from all of the individuals previously delegated by the server, each tested individual being received in association with an indication of its performance during testing by the at least one client device and at least the tested individuals in the first subset also being received in association with an indication of its ancestry count.
- View Dependent Claims (28)
- - 28. The data mining method of claim 27, wherein the selecting of individuals for discarding selects individuals for discarding further in dependence upon their respective ancestry counts, such that when selecting individuals for discarding, their respective client-centric overall fitness estimates are handicapped in dependence upon their respective ancestry counts.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cognizant Technology Solutions U.S. Corporation (Cognizant Technology Solutions Corp.)
Original Assignee
Cognizant Technology Solutions U.S. Corporation (Cognizant Technology Solutions Corp.)
Inventors
Fink, Daniel E., Shahrzad, Hormoz
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Zidanic, Michael

Application Number

US14/595,991
Time in Patent Office

1,561 Days
Field of Search

None
US Class Current
CPC Class Codes

G06N 3/126 Evolutionary algorithms, e....

G06N 5/025 Extracting rules from data

Data mining technique with maintenance of ancestry counts

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Data mining technique with maintenance of ancestry counts

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links