Data balancing method based on pseudo negative sample and method for improving data classification performance

Data balancing method based on pseudo negative sample and method for improving data classification performance

  • CN 109,272,056 B
  • Filed: 10/30/2018
  • Issued: 09/21/2021
  • Est. Priority Date: 10/30/2018
  • Status: Active Grant
First Claim
Patent Images

1. A data balance method based on false negative samples applied to bioinformatics is characterized by comprising the following steps:

  • step 1;

    carrying out positive and negative sample separation on a biological information data set to be processed to obtain a positive sample set and a negative sample set;

    calculating the Pearson correlation coefficient of each negative sample in the negative sample set and all positive samples in the positive sample set to obtain a Pearson correlation coefficient set of the negative samples;

    step 2;

    initializing a pseudo-negative sample set into an empty set, and initializing a selected sample set into a negative sample set;

    and step 3;

    traversing the selected sample set by using negative samples, and calculating the weights of all negative samples in the negative sample set by using a maximum correlation-minimum redundancy method to obtain a weight set;

    selecting the maximum weight from the weight set, adding a negative sample corresponding to the maximum weight into a pseudo negative sample set, and simultaneously removing the negative sample corresponding to the maximum weight from the selected sample set;

    and 4, step 4;

    repeating the step 3 until a pseudo-negative sample set is selected, wherein the number of the samples in the finally selected pseudo-negative sample set is 10% -100% of the number of the samples in the positive sample set;

    and 5;

    merging the selected pseudo-negative sample set into the positive sample set to form a new positive sample set, and simultaneously removing the selected pseudo-negative sample set from the negative sample set to form a new negative sample set;

    in step 1, the average value of the pearson correlation coefficient of each negative sample and all positive samples is used to represent the pearson correlation coefficient of each negative sample, and the calculation formula is as follows;

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×