Scalable tree-based search of content descriptors
First Claim
Patent Images
1. A computer-implemented method for searching a collection of content, comprising:
- under control of one or more computer systems configured with executable instructions,receiving a search request corresponding to query content, the query content characterized by at least one query descriptor;
traversing an index tree including a plurality of levels that indexes a set of content descriptors characterizing the collection of content, the traversing including, at one or more of the plurality of levels of the index tree;
determining distances between child nodes of a distinguished node of the index tree and said at least one query descriptor based at least in part on index descriptors of the child nodes, each index descriptor representing a summarized characteristic for each child node;
determining a maximum number of child nodes to select for traversal based at least in part on the level of the distinguished node, the level of the distinguished node having a threshold relative to one or more other levels; and
selecting for traversal, in order of distance from said at least one query descriptor, at most the maximum number of child nodes, the maximum number being determined based at least in part on the level of the distinguished node;
selecting a subset of the content descriptors based at least in part on the traversing of the index tree; and
providing for presentation at least a reference to content in the collection characterized by at least one of the subset of content descriptors.
1 Assignment
0 Petitions
Accused Products
Abstract
Multiple paths of an index tree may be traversed to discover a set of content descriptors that are match candidates for a set of query descriptors. A size of the set of candidate content descriptors may be optimized, for example, to reduce false positive matching errors, query latencies and/or index tree traversal times, at least in part by determining a number of child nodes to traverse based at least in part on current traverse level and/or traverse neighborhood thresholds. Index trees for large content descriptor sets may be built in resource constrained environments with approximation and/or refining build techniques.
-
Citations
26 Claims
-
1. A computer-implemented method for searching a collection of content, comprising:
-
under control of one or more computer systems configured with executable instructions, receiving a search request corresponding to query content, the query content characterized by at least one query descriptor; traversing an index tree including a plurality of levels that indexes a set of content descriptors characterizing the collection of content, the traversing including, at one or more of the plurality of levels of the index tree; determining distances between child nodes of a distinguished node of the index tree and said at least one query descriptor based at least in part on index descriptors of the child nodes, each index descriptor representing a summarized characteristic for each child node; determining a maximum number of child nodes to select for traversal based at least in part on the level of the distinguished node, the level of the distinguished node having a threshold relative to one or more other levels; and selecting for traversal, in order of distance from said at least one query descriptor, at most the maximum number of child nodes, the maximum number being determined based at least in part on the level of the distinguished node; selecting a subset of the content descriptors based at least in part on the traversing of the index tree; and providing for presentation at least a reference to content in the collection characterized by at least one of the subset of content descriptors. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for searching a collection of content, comprising:
-
under control of one or more computer systems configured with executable instructions, determining at least one query descriptor based at least in part on query content; and determining a subset of a set of content descriptors characterizing the collection of content, the set of content descriptors being indexed by an index tree including a plurality of nodes at a plurality of node levels by, at least; determining, for a parent node in at least one of the plurality of node levels, a number of child nodes to traverse, the number of child nodes to traverse based at least in part on the node level of the parent node, the node level of the parent node having a threshold relative to one or more other node levels; and traversing at most the number of child nodes nearest in distance to said at least one query descriptor as part of identifying a set of lowest level nodes indexing the subset of the set of content descriptors, the number of child nodes nearest in distance being determined based at least in part on index descriptors of the child nodes, each index descriptor representing a summarized characteristic for each child node. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method for searching a collection of content, comprising:
-
under control of one or more computer systems configured with executable instructions, determining at least one query descriptor based at least in part on query content; determining a subset of a set of content descriptors indexed by an index tree at least in part by traversing the index tree to identify a set of lowest level nodes indexing the subset, the traversing including; selecting a first nearest child node to said at least one query descriptor for traversal; determining a relative distance of at least a second nearest child node from said at least one query descriptor with respect to the first nearest child node, the distance being determined based at least in part on index descriptors of the first and second nearest child nodes, each index descriptor representing a summarized characteristic for each child node; and selecting said at least the second nearest child node for traversal if the relative distance is less than a traversal neighborhood threshold; and providing for presentation at least a reference to content in the collection associated with at least one of the set of content descriptors. - View Dependent Claims (14, 15, 16)
-
-
17. A computerized system for searching a collection of content, comprising:
-
a data store storing at least; a set of content descriptors characterizing the collection of content; and an index tree indexing the set of content descriptors; and a processor; and a memory device including instructions that, when executed by the processor, cause the processor to; determine at least one query descriptor based at least in part on query content; determine a subset of the set of content descriptors at least in part by traversing the index tree, the traversing including; determining a number of child nodes to select for traversal based at least in part on traversal level; and selecting for traversal at most the number of child nodes nearest in distance to said at least one query descriptor, the number of child nodes nearest in distance being determined based at least in part on index descriptors of the first and second nearest child nodes, each index descriptor representing a summarized characteristic for each child node; and provide for presentation at least a reference to content in the collection characterized by at least one of the subset of the set of content descriptors. - View Dependent Claims (18, 19, 20, 21)
-
-
22. One or more non-transitory computer-readable media having collectively thereon computer-executable instructions that configure one or more computers to collectively, at least:
-
determine at least one query descriptor based at least in part on query content; determine a subset of content descriptors indexed by an index tree at least in part by traversing the index tree, the traversing including; (a) selecting for traversal a first child node nearest to said at least one query descriptor; (b) determining a first distance of a second child node from said at least one query descriptor; and (c) selecting the second child node for traversal if the first distance is less than a traversal neighborhood threshold, the distance being determined based at least in part on an index descriptor of the second child node, the index descriptor representing a summarized characteristic for the second child node; and provide for presentation at least a reference to content characterized by at least one of the subset of content descriptors. - View Dependent Claims (23, 24, 25, 26)
-
Specification