Degree of outlier calculation device, and probability density estimation device and forgetful histogram calculation device for use therein
First Claim
1. A probability density estimation device for an anomalous data detection system adapted to detect anomalous data, said probability density estimation device configured for a degree of outlier calculation device for sequentially calculating a degree of outlier of each data with a data sequence of real vector values as input, said probability density estimation device for, while sequentially reading said data sequence, estimating a probability distribution of generation of the data by using a finite mixture distribution of normal distributions with a weighting parameter, a mean parameter and a variance parameter, said probability density estimation device comprising:
- probability calculation means for calculating, based on a value of input data and values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities, a probability of generation of the input data from each normal distribution; and
parameter output means for updating and rewriting the stored parameter values while gradually forgetting past data, according to newly read data based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution,and anomalous data, indicative of fraud, being identified when said probability of generation of the input data deviates from said stored parameter values.
1 Assignment
0 Petitions
Accused Products
Abstract
Degree of outlier of one input data is calculated by an amount of change in a learned probability density from that before learning as a result of taking in of the input data. This is because data largely differing in a tendency from a so far learned probability density function can be considered to have a high degree of outlier. More specifically, a function of a distance between probability densities before and after data input is calculated as a degree of outlier. Accordingly, a probability density estimation device appropriately estimates a probability distribution of generation of unfair data while sequentially reading a large volume of data and a score calculation device calculates and outputs a degree of outlier of each data based on the estimated probability distribution.
-
Citations
12 Claims
-
1. A probability density estimation device for an anomalous data detection system adapted to detect anomalous data, said probability density estimation device configured for a degree of outlier calculation device for sequentially calculating a degree of outlier of each data with a data sequence of real vector values as input, said probability density estimation device for, while sequentially reading said data sequence, estimating a probability distribution of generation of the data by using a finite mixture distribution of normal distributions with a weighting parameter, a mean parameter and a variance parameter, said probability density estimation device comprising:
-
probability calculation means for calculating, based on a value of input data and values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities, a probability of generation of the input data from each normal distribution; and parameter output means for updating and rewriting the stored parameter values while gradually forgetting past data, according to newly read data based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution, and anomalous data, indicative of fraud, being identified when said probability of generation of the input data deviates from said stored parameter values. - View Dependent Claims (2)
-
-
3. A degree of outlier calculation device for sequentially calculating a degree of outlier of each data with a data sequence of real vector values as input, said degree of outlier calculation device adapted to detect anomalous data, and comprising:
-
a probability density estimation device for, while sequentially reading said data sequence, estimating a probability distribution of generation of the data by using a finite mixture of normal distributions with a weighting parameter, a mean parameter and a variance parameter, said probability density estimation device including; (a) parameter storage means for storing values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities and a weighting parameter of each normal distribution; (b) probability calculation means for calculating, based on a value of input data and values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities, a probability of generation of the input data from each normal distribution; and (c) parameter rewriting means for updating and rewriting the stored parameter values while gradually forgetting past data, according to newly read data based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution; and degree of outlier calculation means for calculating and outputting a degree of outlier of said data by using a parameter of the normal mixture updated by said probability density estimation device and based on a degree of change or a logarithmic loss of a probability distribution estimated from values of the parameters before and after the updating and the input data, and anomalous data, indicative of fraud, being identified when said degree of outlier data deviates from said stored parameter values.
-
-
4. A histogram calculation device for a degree of outlier calculation device for sequentially calculating a degree of outlier of each data with discrete value data as input, said degree of outlier calculation device useful for anomalous data detection, histogram calculation device for calculating a parameter of a histogram with respect to said discrete value data sequentially input, said histogram calculation device comprising:
-
storage means for storing a parameter value of said histogram; and parameter updating means for reading said parameter value from the storage means and updating past parameter values while gradually forgetting past data based on input data to rewrite the value of said storage means, thereby outputting some of parameter values of said storage means, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
5. A degree of outlier calculation device for sequentially calculating a degree of outlier of each data with discrete value data as input, said degree of outlier calculation device useful for anomalous data detection, and comprising:
-
a histogram calculation device for calculating a parameter of a histogram with respect to said discrete value data sequentially input, said histogram calculation device including; storage means for storing a parameter value of said histogram; and parameter updating means for reading said parameter value from the storage means and updating past parameter values while gradually forgetting past data based on input data to rewrite the value of said storage means, thereby outputting some of parameter values of said storage means; and score calculation means for calculating, based on the output of the histogram calculation device and said input data, a score of the input data with respect to said histogram, thereby outputting the output of the score calculation means as a degree of outlier of said input data, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
6. A degree of outlier calculation device for calculating a degree of outlier with respect to sequentially input data which is described both in a discrete value and a continuous value, said degree of outlier calculation device useful for anomalous data detection, and comprising:
-
a histogram calculation device for estimating a histogram with respect to a discrete value data part; a number of probability density estimation devices, the number equal to the number of cells of said histogram, the probability density estimation devices for estimating a probability density with respect to a continuous value data part; cell determination means for determining to which cell of said histogram said discrete value data part belongs to send the continuous data part to the corresponding one of said probability density estimation devices; and score calculation means for calculating a score of said input data based on a degree of change or logarithmic loss of a probability distribution estimated from output values of said histogram calculation device and said probability density estimation device and said input data, thereby outputting the output of the score calculation means as a degree of outlier of said input data; said histogram calculation device including; storage means for storing a parameter value of said histogram; and parameter updating means for reading said parameter value from the storage means and updating past parameter values while gradually forgetting past data based on input data to rewrite the value of said storage means, thereby outputting some of parameter values of said storage means; and said probability density estimation device including; parameter storage means for storing values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities and a weighting parameter of each normal distribution; probability calculation means for calculating, based on a value of input data, and values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities, a probability of generation of the input data from each normal distribution; and parameter rewriting means for updating and rewriting the stored parameter values while gradually forgetting past data, according to newly read data based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
7. A degree of outlier calculation device for calculating a degree of outlier with respect to sequentially input data which is described both in a discrete value and a continuous value, said degree of outlier calculation device useful for anomalous data detection, and comprising:
-
a histogram calculation device for estimating a histogram with respect to said discrete value data part; a number of probability density estimation devices, the number equal to the number of cells of said histogram for estimating a probability density with respect to a continuous value data part; cell determination means for determining to which cell of the histogram said discrete value data part belongs to send the continuous data part to the corresponding one of said probability density estimation devices; and score calculation means for calculating a score of said input data based on a degree of change or logarithmic loss of a probability distribution estimated from output values of said histogram calculation device and said probability density estimation device and said input data, thereby outputting the output of the score calculation means as a degree of outlier of said input data; said histogram calculation device including; storage means for storing a parameter value of said histogram; and parameter updating means for reading said parameter value from the storage means and updating past parameter values while Gradually forgetting past data based on input data to rewrite the value of said storage means, thereby outputting some of parameter values of said storage means; and said probability density estimation device including; parameter storage means for storing a value of a parameter indicative of a position of each kernel; and parameter rewriting means for reading a value of a parameter from the storage means and updating the stored parameter values while gradually forgetting past data, according to newly read data to rewrite the contents of the parameter storage means, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
8. A probability density estimation method for a degree of outlier calculation device of a data processor for sequentially calculating a degree of outlier of each data with a data sequence of real vector values as input, said degree of outlier calculation device useful for anomalous data detection, said probability density estimation method of, while sequentially reading said data sequence, estimating a probability distribution of generation of the data by using a finite mixture of normal distributions with a weighting parameter, a mean parameter and a variance parameter, the method comprising:
-
based on values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities read from parameter storage means for storing a value of input data, values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities, and a weighting parameter of each normal distribution, calculating a probability of generation of the input data from each normal distribution; and updating the stored parameter values while gradually forgetting past data, according to newly read data based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution to rewrite data of said parameter storage means, and anomalous data, indicative of fraud, being identified when said probability of generation of the input data deviates from said stored parameter values.
-
-
9. A computer-readable medium incorporating a program of instructions executable by a computer for performing a method of sequentially calculating a degree of outlier of each data for anomalous data detection, with a data sequence of real vector values as input, including a probability density estimation for, while sequentially reading said data sequence, estimating a probability distribution of generation of the data by using a finite mixture of normal distributions with a weighting parameter, a mean parameter and a variance parameter, the probability density estimation comprising:
-
based on values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities read from parameter storage means for storing a value of input data, values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities, and a weighting parameter of each normal distribution, calculating a probability of generation of the input data from each normal distribution; and updating the stored parameter values while gradually forgetting past data, according to newly read data based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution to rewrite data of said parameter storage means;
said method of sequentially calculating a degree of outlier of each data further comprising;calculating and outputting a degree of outlier of said data by using a parameter of the finite mixture distribution updated by said probability density estimation and based on a probability distribution estimated from values of the parameters before and after the updating and the input data, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
10. A computer-readable medium incorporating a program of instructions executable by a computer for performing a histogram calculation method for calculation of a degree of outlier for sequentially calculating a degree of outlier of each data with discrete value data as input, said calculation of the degree of outlier for detecting anomalous data, said histogram calculation method calculating a parameter of a histogram with respect to said discrete value data sequentially input, comprising:
-
reading said parameter value from storage means for storing a parameter value of said histogram and updating past parameter values while gradually forgetting past data based on input data to rewrite the value of said storage means; and outputting some of parameter values of said storage means, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
11. A degree of outlier calculation device of a data processor for sequentially calculating a degree of outlier of each data with discrete value data as input, said degree of outlier calculation device useful for detecting anomalous data, and said degree of outlier calculation device comprising:
-
a histogram calculation device for calculating a parameter of a histogram with respect to said discrete value data sequentially input, including; storage means for storing a parameter value of said histogram; and parameter updating means for reading said parameter value from the storage means and updating past parameter values while gradually forgetting past data based on input data to rewrite the value of said storage means, thereby outputting some of parameter values of said storage means; and score calculation means for calculating, based on the output of the histogram calculation device and said input data, a score of the input data with respect to said histogram, thereby outputting the score calculation result as a degree of outlier of said input data, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
-
12. A degree of outlier calculation method of calculating a degree of outlier by a data processor with respect to sequentially input data which is described both in a discrete value and a continuous value, calculation of the degree of outlier useful for detection of anomalous data, wherein a histogram calculation estimates a histogram with respect to a discrete value data part, said method comprising:
-
reading said parameter value from storage means for storing a parameter value of said histogram and updating past parameter values while gradually forgetting past data based on input data to rewrite the value of said storage means; and outputting some of parameter values of said storage means, and wherein probability density estimation devices provided as many as the number of cells of said histogram for estimating a probability density with respect to a continuous value data part, said method comprises the steps of; based on values of a mean parameter and a variance parameter of each of a finite number of normal distribution densities read from parameter storage means for storing a value of input data, values of a mean parameter and variance parameter of each of a finite number of normal distribution densities and a weighting parameter of each normal distribution, calculating a probability of generation of the input data from each normal distribution; based on a probability obtained by the probability calculation means, values of a mean parameter and a variance parameter of each normal distribution and a weighting parameter of each normal distribution, updating the stored parameter values while gradually forgetting past data, according to newly read data to rewrite the data of said parameter storage means; determining to which cell of said histogram said discrete value data part belongs to send the continuous data part to the corresponding one of said probability density estimation devices; calculating a score of said input data based on a degree of change or logarithmic loss of a probability distribution estimated from output values of said histogram calculation device and said probability density estimation device and said input data; and outputting the score calculation result as a degree of outlier of said input data, and anomalous data, indicative of fraud, being identified when said degree of outlier deviates from said stored parameter values.
-
Specification