System and method to store video fingerprints on distributed nodes in cloud systems
First Claim
1. A method of processing video fingerprint data in a cloud infrastructure, the method comprising:
- analysing video data;
extracting features from the video data to form the video fingerprint data, the video fingerprint data taking the form of multidimensional vectors;
computing meta data and data points from the multidimensional vectors;
inputting the meta data and the data points from the multidimensional vectors into a distributed index having multiple levels, the distributed index comprising a directing tree and leaf nodes;
storing the meta data relating to the multidimensional vectors in the directing tree;
storing the data points computed from the multidimensional vectors in the leaf nodes;
scaling the distributed index by increasing or decreasing a number of the leaf nodes in the distributed index size depending upon a number of the multidimensional vectors to be stored;
distributing the leaf nodes across at least one client system;
searching for a nearest neighbour of a data point;
inserting a root node of the directing tree into a priority queue;
determining a closest distance node from the priority queue to a query node;
if the closest distance node is a leaf node or a bin, adding the closest distance node to a list of preferred bins;
if the closest distance node is not a leaf node or a bin, determining a distance from the query node to a child node wherein the child node is a child of the closest distance node and adding the child node to the priority queue; and
terminating if a maximum number of bins is reached or the priority queue is empty,wherein the directing tree contains no data points but contains the meta data computed from the multidimensional vectors; and
the leaf nodes contain data points which are apportioned to each respective leaf node in a predetermined manner so that each of the leaf nodes contains data points that are relatively close to other data points in the same leaf node as compared to the data points in other leaf nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
A method to design, implement and create distributed indexes for storing and comparing fingerprints of videos is presented. The method effectively utilizes cloud computing platforms that offer varying amounts of computing resources. The method enables the distributed index to scale to large numbers of data points and the distributed index is robust to failures within the computing resources maintaining the index. The method minimizes the memory required to maintain the distributed index and reduces the I/O operations needed to process operations performed on the index. The method improves the efficiency of the index to process queries.
12 Citations
32 Claims
-
1. A method of processing video fingerprint data in a cloud infrastructure, the method comprising:
-
analysing video data; extracting features from the video data to form the video fingerprint data, the video fingerprint data taking the form of multidimensional vectors; computing meta data and data points from the multidimensional vectors; inputting the meta data and the data points from the multidimensional vectors into a distributed index having multiple levels, the distributed index comprising a directing tree and leaf nodes; storing the meta data relating to the multidimensional vectors in the directing tree; storing the data points computed from the multidimensional vectors in the leaf nodes; scaling the distributed index by increasing or decreasing a number of the leaf nodes in the distributed index size depending upon a number of the multidimensional vectors to be stored; distributing the leaf nodes across at least one client system; searching for a nearest neighbour of a data point; inserting a root node of the directing tree into a priority queue; determining a closest distance node from the priority queue to a query node; if the closest distance node is a leaf node or a bin, adding the closest distance node to a list of preferred bins; if the closest distance node is not a leaf node or a bin, determining a distance from the query node to a child node wherein the child node is a child of the closest distance node and adding the child node to the priority queue; and terminating if a maximum number of bins is reached or the priority queue is empty, wherein the directing tree contains no data points but contains the meta data computed from the multidimensional vectors; and the leaf nodes contain data points which are apportioned to each respective leaf node in a predetermined manner so that each of the leaf nodes contains data points that are relatively close to other data points in the same leaf node as compared to the data points in other leaf nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for processing video fingerprint data in a cloud infrastructure, the system having a video fingerprint processor and a distributed index, wherein:
-
the video fingerprint processor is configured to extract features from video data to create multidimensional vectors; meta data and data points are computed from the multidimensional vectors; the distributed index has multiple levels and is formed of a directing tree and at least one leaf node; the directing tree is configured to store the meta data which relates to the multidimensional vectors; the at least one leaf node is configured to store the data points computed from the multidimensional vectors; the distributed index is scalable and is adapted to be increased or decreased in size dependent upon a number of the multidimensional vectors to be stored; the at least one leaf node is stored on a client system, the distributed index is configured to search for a nearest neighbour of a data point in the multiple leaf nodes; the search for the nearest neighbour of the data point includes inserting a root of the directing tree into a priority queue; the distributed index is configured to determine a closest distance node from the priority queue to a query node; if the closest distance node is one of the multiple leaf nodes, the distributed index is configured to add the closest distance node to a list of preferred bins; if the closest distance node is not one of the multiple leaf nodes, the distributed index is configured to determine a distance from the query node to a child node wherein the child node is a child of the closest distance node and add the child node to the priority queue; and the distributed index is configured to terminate if a maximum number of bins is reached or the priority queue is empty, wherein the directing tree contains no data points but contains the meta data computed from the multidimensional vectors; and the leaf nodes contain data points which are apportioned to each respective leaf node in a predetermined manner so that each of the leaf node contains data points that are relatively close to other data points in the same leaf node as compared to the data points in other leaf nodes. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification