Eliminating method for exceptional value in pharmaceutical test data based on Grubbs rule and matrix laboratory language

Eliminating method for exceptional value in pharmaceutical test data based on Grubbs rule and matrix laboratory language

  • CN 102,436,542 A
  • Filed: 09/22/2011
  • Published: 05/02/2012
  • Est. Priority Date: 09/22/2011
  • Status: Active Application
First Claim
Patent Images

1. the elimination method based on exceptional value in the pharmacy test data of Grubbs rule and matrix experiment chamber language is characterized in that, is realized by following steps:

  • (1), the model that programs, method is;

    With function [Xnew, del, index]=Grubbs (X, alpha, tail), X wherein, alpha, tail are input variable, Xnew, del, index are output variable, each variable implication is following;

    X is the matrix of input raw data, or claims the input vector of test value, and the data most number is 100, and to the horizontally-arranged or the vertical setting of types no requirement (NR) of data, the structure after the processing is consistent with original structure;

    Alpha is a level of signifiance value, is 0.01,0.05,0.1, acquiescence 0.05;

    Tail is a mantissa value, comprises one-sided test mantissa value and two-sided test mantissa value, and-1 is one-sided low value check mantissa value, and 1 is one-sided high value check mantissa value, and 0 for the two-sided test mantissa value, is defaulted as 0;

    Xnew is the vector of the final no outlier that generates;

    Del is the outlier that deletes;

    Index is the original number of the outlier that deletes;

    If the variable number of input is 1, then give tacit consent to level of signifiance value and get 0.05, giving tacit consent to mantissa simultaneously is 0, i.e. two-sided test;

    If the variable number of input is 2, then giving tacit consent to mantissa is 0, i.e. two-sided test finishes;

    If the level of signifiance of input is not arbitrary number in 0.01,0.05,0.1, then the alternative level of signifiance value 0.01,0.05 or 0.1 of prompting input is returned, and finishes;

    If the mantissa of input is not arbitrary number in 1 ,-1,0, then the correct mantissa-1,0 or 1 of prompting input returns, and finishes;

    Unified is column vector;

    [p, q]=size (X), and wherein p, q are line number and the columns of matrix X, if q equals 1;

    [n, m]=size (X) then composes the line number of matrix X and the value of columns to variable n and m, if p equals 1;

    Then matrix X is carried out transposition X=X'"'"',, then become the matrix of 20 row, 1 row, [n if promptly be originally the matrix of 1 row, 20 row;

    M]=size (X), the line number of matrix X and the value of columns are composed to variable n and m, finish;

    The number of replicate determination can not if the number of replicate determination greater than 100, exceeds the program category, then be returned greater than 100, finishes;

    Make up the sample matrix of n * 2, N=1;

    1;

    n, N are 1,2;

    3 ... The ordered series of numbers of n is column vector N=N'"'"' with the N transposition, [Xsort, N]=sort (X);

    Numerical value among the matrix X is carried out ascending ordering, and the new matrix called after Xsort after will sorting, Xorigin=[N, Xsort];

    Generate " original " matrix Xorigin that sorts simultaneously by the sample number size, Xsort2=Xorigin, Xsort2 is identical with Xorigin;

    Index=zeros (n,

         1) promptly generates the null matrix that a n capable 1 is listed as;

    The judgement of tables of critical values;

    with reference to the statistical treatment and the explanation of National Standard data---normal sample outlier and processing (GB-T4883-2008) ";

    Gtab=[?1 0.9?0.95 0.975 0.99 0.9952 0 ?0 ?0 ?0 ?03 1.148 1.153 1.155 1.155 1.1554 1.425 1.463 1.481 1.492 1.4965 1.602 1.672 1.715 1.749 1.7646 1.729 1.822 1.887 1.944 1.9737 1.828 1.938 2.020 2.097 2.1398 1.909 2.032 2.126 2.221 2.274Omit 90 row99?3.014 3.204 3.380 3.597 3.750100 3.017 3.207 3.383 3.600 3.754];

    Among the tables of critical values Gtab;

    First classifies sequence number as;

    0.9,0.95,0.975,0.99,0.995 is the level of signifiance in first row, walks to the 100th from the 3rd and goes, and being respectively sample size is 3 to 100 o'"'"'clock corresponding critical values of each level of signifiance;

    90 row are omitted in the centre, and concrete data can be referring to GB-T4883-2008;

    If mantissa is-1, then carry out the low value check, at first finding first value is minimum value, compute statistics Gi, computing formula is by statistical treatment and explanation---the judgement and the processing (GB-T4883-2008) of normal sample outlier of National Standard data " in " G", and compare the relative size that is worth (Gs) in Gi and the tables of critical values accordingly, if Gi/Gs is>

    1, then minimum value is rejected, and obtains new data, and the next round recycle ratio of carrying out new data;

    Otherwise do not reject;

    If mantissa gets 1, then carry out the check of high value, at first finding last value is maximal value, compute statistics Gi, computing formula is by statistical treatment and explanation---the judgement and the processing (GB-T4883-2008) of normal sample outlier of National Standard data " in " Gn", and compare the relative size that is worth (Gs) in Gi and the tables of critical values accordingly, if Gi/Gs is>

    1, then maximal value is rejected, and obtains new data, and the next round recycle ratio of carrying out new data;

    Otherwise do not reject;

    If mantissa gets 0;

    Then carry out two-sided test, at first find minimum value and maximal value, calculate maximal value and deduct the statistic Dmax of average and statistic Dmin and the two size of comparison that average deducts minimum value;

    If Dmax/Dmin>

    1;

    Then according to above-mentioned " if mantissa gets 1, then carry out high value check ... " mode carry out next step;

    Otherwise by above-mentioned " if mantissa be-1, then carry out low value and check ... " mode carry out next step, if there are data disallowable, then carry out the comparison and the rejecting of a new round again;

    At last, be adjusted to the single-row of original, unordered, vector to the order of elements of newly-generated Xnew again, the order with the data after the detection in the new matrix during still by initial input shows;

    Sequence number is synchronous, and it is that 0 numerical value removes that outlier is numbered among the ordered series of numbers index, if do not had element after removing;

    The outlier that then deletes does not promptly delete outlier for empty, and the numbering of the outlier that is deleted simultaneously is also for empty;

    Show " data set does not contain overflow value ", return;

    If delete is to also have element behind 0 the element, then numbering is carried out ascending ordering, and b=length (index), b are the numbers of the data that delete altogether;

    For c=1;

    b, c get 1,2 successively ... B;

    Value successively, circulation is carried out, and finds the row number that equals c in the first row sequence number of Xsort2, and then the secondary series data of this row number pairing Xsort2 are the exceptional value of being deleted;

    End loop, transposed matrix finishes;

    At last, if p gets 1, promptly initial vector is the row vector, then newly-generated matrix is carried out transposition, and the matrix of the exceptional value of deletion also carries out transposition, and the sequence number matrix of the exceptional value of deletion also carries out transposition simultaneously;

    (2), the exceptional value circulation of single group data is rejected, method is according to the program of step 1 establishment, to carry out following steps through microcomputer;

    at first imports raw data, the level of signifiance and mantissa'"'"'s variable, i.e. X, alpha;

    Tail, and input function [Xnew, del;

    Index]=Grubbs (X, alpha, tail);

    Input variable test data for same group;

    The data most number of replicate determination is 100, otherwise returns prompt text " the replicate determination number surpasses 100, exceeds this program category ";

    The value of " level of signifiance " is 0.01,0.05,0.1;

    Be defaulted as 0.05,1 program will be carried out the judgement whether this level of signifiance gets one of 0.01,0.05,0.1 3 value set by step, if eligible;

    Then get into next step, otherwise the alternative level of signifiance value of prompting input;

    The choosing value of " mantissa " is-1,1,0, is defaulted as 0, i.e. mantissa'"'"'s value of one-sided test and two-sided test is different;

    -1 is one-sided low value check, and 1 is one-sided high value check, and 0 is two-sided test;

    To carry out the judgement whether mantissa gets one of-1,0,1 three value according to step 1 program;

    If eligible, then get into next step, otherwise the correct mantissa of prompting input;

    uniform data structure;

    , then remain unchanged if former input data are column vector;

    If the row vector, then transposition is a column vector, for example, can the matrix that be originally 1 row, 20 row be become the matrix of 20 row, 1 row;

    makes up a matrix that contains original sequence number and original test data;

    And by the ascending sortord ordering of raw data;

    Original sequence number changes with the order of test data simultaneously, and it is subsequent use to generate new matrix;

    judges the mantissa value of being imported;

    If mantissa is-1, then carry out the low value check of downside, at first find first minimum value, compute statistics Gi, the judgement of the statistical treatment and the explanation of its numerical value and National Standard data---normal sample outlier and processing (GB-T4883-2008) " in " G" identical, compare the relative size that is worth (Gs) in Gi and the tables of critical values accordingly, if Gi/Gs is>

    1, then minimum value is rejected, and obtains new data, and the next round recycle ratio of carrying out new data;

    Otherwise do not reject;

    If mantissa is 1;

    Then carry out the high value check of upside;

    At first find a last maximal value, compute statistics Gi, the judgement of the statistical treatment and the explanation of its numerical value and National Standard data---normal sample outlier and processing (GB-T4883-2008) " in " Gn" identical, compare the relative size that is worth (Gs) in Gi and the tables of critical values accordingly, if Gi/Gs is>

    1, then maximal value is rejected, and obtains new data, and the next round recycle ratio of carrying out new data;

    Otherwise do not reject;

    Note, here GWith GnComputing formula is different, specifically referring to GB-T4883-2008;

    If mantissa is 0;

    Then carry out two-sided test;

    At first find minimum value and maximal value;

    If compute statistics Dmax and Dmin and the two size relatively are Dmax/Dmin>

    1, be that 1 method is carried out next step then according to mantissa in the step ;

    Otherwise mantissa is that-1 method is carried out next step in set by step;

    If there are data disallowable, then carry out the comparison and the rejecting of a new round again;

    disposal data structure is adjusted to original order to the plain order of new entry of a matrix;

    If initial vector is the row vector, still be reduced to the row vector;

    6. provide output variable, comprise the sequence number of new data, outlier and outlier, if raw data meets the abnormality value removing condition, the new data after then obtaining rejecting can be proceeded other handled to new data;

    If all data all do not meet the abnormality value removing condition, then data set does not contain exceptional value, can directly proceed other handled to raw data;

    (3), the batch circulation of multi-group data exceptional value is rejected, method is, the program of 1 establishment is set by step realized by following steps;

    1. judge at first whether input variable meets the function requirement;

    Input variable is become the matrix by array vector except that raw data;

    The requirement of the level of signifiance, mantissa is all with step 2, and the number of the maximum replicate determination of single group data is 100 in the raw data matrix, but to the not restriction of group number;

    But " OK " that should guarantee raw data matrix and " row " regulation when designing program is consistent, otherwise transposed matrix;

    2. generate the empty cellular array of output variable,, therefore do not adopt matrix form, and adopt the form of cellular array to represent because of the number possibility of the final value of rejecting of different pieces of information group is different;

    3. successively set by step the 2 pairs of every group of data carry out the Grubbs abnormality value removing of single group data, the result that will generate simultaneously composes to the corresponding matrix of above-mentioned cellular array successively;

    4. provide output information, comprise the sequence number cellular array of cellular array, outlier cellular array and the outlier of the final no outlier that generates of new data, thereby obtain final pharmacy test data.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×