Providing prevalence information using query data
First Claim
1. A machine readable non-transitory storage medium having instructions stored thereon for providing prevalence information based on a query, wherein the instructions when executed by at least one processors cause the at least one processors to perform the following operations:
- retrieving a sequence of target values from a data source;
receiving query data from a communication device;
generating a sequence of query values based on an identifier associated with the communication device;
comparing each value in the sequence of query values and a corresponding value in the sequence of target values to identify a number of matching values, wherein the number of matching values counts the number of times the value is equal to the corresponding value;
upon determining that the number of matching values exceeds a maximum number of matches stored in the data source, updating the maximum number of matches in the data source;
calculating, using a statistical model, a prevalence value and a degree of confidence in the prevalence value based, at least in part, on the maximum number of matching values, the prevalence value being proportional to the number of matching values;
weighting the prevalence value based on the degree of confidence in the prevalence value, andoutputting to the communication device the prevalence value for the query data.
10 Assignments
0 Petitions
Accused Products
Abstract
In one example, a data security system may determine prevalence of a file based query data for an object (e.g., a file or a hash or a file). An example algorithm may provide using a statistically justifiable estimate of the prevalence while storing few data records, and therefore may provide prevalence information in O(1) time complexity (i.e., constant time). Such an algorithm may be applied in near real-time to provide, e.g., an immediate response to a query for the prevalence of a file.
17 Citations
16 Claims
-
1. A machine readable non-transitory storage medium having instructions stored thereon for providing prevalence information based on a query, wherein the instructions when executed by at least one processors cause the at least one processors to perform the following operations:
-
retrieving a sequence of target values from a data source; receiving query data from a communication device; generating a sequence of query values based on an identifier associated with the communication device; comparing each value in the sequence of query values and a corresponding value in the sequence of target values to identify a number of matching values, wherein the number of matching values counts the number of times the value is equal to the corresponding value; upon determining that the number of matching values exceeds a maximum number of matches stored in the data source, updating the maximum number of matches in the data source; calculating, using a statistical model, a prevalence value and a degree of confidence in the prevalence value based, at least in part, on the maximum number of matching values, the prevalence value being proportional to the number of matching values; weighting the prevalence value based on the degree of confidence in the prevalence value, and outputting to the communication device the prevalence value for the query data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An apparatus for providing prevalence information based on a query, the apparatus comprising:
-
at least one memory element; at least one processor coupled to the at least one memory element; a reputation information server coupled to the at least one processor, wherein the reputation information server is configured to; retrieve a sequence of target values from a data source; receive query data from a communication device; generate a sequence of query values based on an identifier associated with the communication device; compare each value in the sequence of query values and a corresponding value in the sequence of target values to identify a number of matching values, wherein the number of matching values counts the number of times the value is equal to the corresponding value; upon determining that the number of matching values exceeds a maximum number of matches stored in the data source, update the maximum number of matches in the data source; calculate, using a statistical model, a prevalence value and a degree of confidence in the prevalence value based, at least in part, on the maximum number of matching values, the prevalence value being proportional to the number of matching values; weight the prevalence value based on the degree of confidence in the prevalence value, and output to the communication device the prevalence value for the query data. - View Dependent Claims (13, 14, 15)
-
-
16. A method for providing prevalence information based on a query, the method comprising:
-
retrieving a sequence of target values from a data source; receiving query data from a communication device; generating a sequence of query values based on an identifier associated with the communication device; comparing each value in the sequence of query values and a corresponding value in the sequence of target values to identify a number of matching values, wherein the number of matching values counts the number of times the value is equal to the corresponding value; and upon determining that the number of matching values exceeds a maximum number of matches stored in the data source, updating the maximum number of matches in the data source; calculating, using a statistical model, a prevalence value and a degree of confidence in the prevalence value based, at least in part, on the maximum number of matching values, the prevalence value being proportional to the number of matching values; weighting the prevalence value based on the degree of confidence in the prevalence value, and outputting to the communication device the prevalence value for the query data.
-
Specification