Signature representation of data having high dimensionality
First Claim
1. A method for generating, in a computing device, an m-dimensional signature vector comprising m vector elements, the method comprising:
- setting an initial value of each vector element in the m vector elements to zero; and
for each vector element in the m vector elements;
accessing a plurality of key-value pairs sequentially, each key-value pair comprising a respective key, corresponding to one of n unique identifiers, and a non-zero value; and
calculating each vector element based on a summation of a plurality of terms by repeating, sequentially, for each respective key-value pair in the plurality of key-value pairs;
calculating a respective term of the plurality of terms based on the respective key-value pair from the plurality of key-value pairs by;
generating a hash based on the key of the respective key-value pair and an element identifier associated with the vector element being calculated;
generating a pseudo-random number from the generated hash; and
multiplying the pseudo-random number by the value of the respective key-value pair; and
adding the respective term calculated to the vector element being calculated,wherein m<
<
n.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for generating an m-dimensional signature vector in a computing device is provided. The signature vector may be generated from a plurality of key-value pairs, each comprising a unique identifier and an associated non-zero value. Each element of the m-dimensional signature vector is calculated based on a summation of a plurality of terms. Each of the terms is calculated from a respective key-value pair by generating a seed based on the key of the respective key-value pair and an element identifier associated with the vector element being calculated; generating a pseudo-random number from the generated seed; and multiplying the pseudo-random number by the value of the respective key-value pair, wherein m<<n.
-
Citations
27 Claims
-
1. A method for generating, in a computing device, an m-dimensional signature vector comprising m vector elements, the method comprising:
-
setting an initial value of each vector element in the m vector elements to zero; and for each vector element in the m vector elements; accessing a plurality of key-value pairs sequentially, each key-value pair comprising a respective key, corresponding to one of n unique identifiers, and a non-zero value; and calculating each vector element based on a summation of a plurality of terms by repeating, sequentially, for each respective key-value pair in the plurality of key-value pairs; calculating a respective term of the plurality of terms based on the respective key-value pair from the plurality of key-value pairs by; generating a hash based on the key of the respective key-value pair and an element identifier associated with the vector element being calculated; generating a pseudo-random number from the generated hash; and multiplying the pseudo-random number by the value of the respective key-value pair; and adding the respective term calculated to the vector element being calculated, wherein m<
<
n. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computing device for generating an m-dimensional signature vector comprising:
-
a non-transitory computer-readable memory containing instructions; and a processor for executing instructions, the instructions when executed by the processor configuring the device to provide functionality for; setting an initial value of each vector element in the m vector elements to zero; and for each vector element in the m vector elements; accessing a plurality of key-value pairs sequentially, each key-value pair comprising a respective key, corresponding to one of n unique identifiers, and a non-zero value; and calculating each vector element based on a summation of a plurality of terms by repeating, sequentially, for each respective key-value pair in the plurality of key-value pairs; calculating a respective term of the plurality of terms based on the respective key-value pair from the plurality of key-value pairs by; generating a hash based on the key of the respective key-value pair and an element identifier associated with the vector element being calculated; generating a pseudo-random number from the generated hash; and multiplying the pseudo-random number by the value of the respective key-value pair; and adding the respective term calculated to the vector element being calculated, wherein m<
<
n. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A non-transitory computer readable memory containing instructions for generating an m-dimensional signature vector comprising m vector elements, the instructions which when executed by a processor perform the method of:
-
setting an initial value of each vector element in the m vector elements to zero; and for each vector element in the m vector elements; accessing a plurality of key-value pairs sequentially, each key-value pair comprising a respective key, corresponding to one of n unique identifiers, and a non-zero value; and calculating each vector element based on a summation of a plurality of terms by repeating, sequentially, for each respective key-value pair in the plurality of key-value pairs; calculating a respective term of the plurality of terms based on the respective key-value pair from the plurality of key-value pairs by; generating a hash based on the key of the respective key-value pair and an element identifier associated with the vector element being calculated; generating a pseudo-random number from the generated hash; and multiplying the pseudo-random number by the value of the respective key-value pair; and adding the respective term calculated to the vector element being calculated, wherein m<
<
n.
-
Specification