Process for enabling flexible and fast content-based retrieval
First Claim
1. A method for identifying any data object in a set of data objects that matches a query data object within a defined limit, said method comprising the steps of:
- (a) determining;
(i) a set of key objects in the set of data objects;
(ii) a set of relational vectors, such that for each data object in the set of data objects, a relational vector describes at least one type of distance measure between that data object and each key object in the set of key objects; and
(iii) a triangle trie for each different type of distance measure that is defined, each triangle trie having a number of levels that is less than a number of key objects in said set of key objects;
(b) enabling a user to select a query object and to select at least one type of distance measure that will be used to match a data object from the set of data objects to said query object;
(c) determining a query relational vector for said query object, such that said query relational vector describes a distance measure between said query object and each key object for each type of distance measure selected;
(d) for each triangle trie related to a distance measure selected by a user, pruning that triangle trie to produce a potentially matching set of data objects from which any data objects that cannot match said query object within the defined limit have been eliminated, thereby reducing a number of data objects in the potentially matching set that potentially will require direct comparisons with said query object; and
(e) directly comparing data objects from said set of data objects that have not yet been eliminated to identify any data object that matches the query data object within the defined limit.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for retrieving database records, based on a close match to a specified query record, using a distance measure to determine the closeness of the match. The method employs a two-stage process to reduce the number of direct comparisons required. In a first stage, a triangle trie for a single distance measure is pruned to reduce a number of potential matches. The result obtained from pruning the triangle trie is further pruned by employing indexed tables generated using a triangle inequality technique. The members of this small set are then directly compared to the query record to identify matches within at least a specified degree of closeness. Multiple triangle tries can be combined to enable threshold searches over a composite measure. Operations, including Min, Max, Sum, and Weight are combined to produce more complex composite distance functions for use in the process.
143 Citations
29 Claims
-
1. A method for identifying any data object in a set of data objects that matches a query data object within a defined limit, said method comprising the steps of:
-
(a) determining;
(i) a set of key objects in the set of data objects;
(ii) a set of relational vectors, such that for each data object in the set of data objects, a relational vector describes at least one type of distance measure between that data object and each key object in the set of key objects; and
(iii) a triangle trie for each different type of distance measure that is defined, each triangle trie having a number of levels that is less than a number of key objects in said set of key objects;
(b) enabling a user to select a query object and to select at least one type of distance measure that will be used to match a data object from the set of data objects to said query object;
(c) determining a query relational vector for said query object, such that said query relational vector describes a distance measure between said query object and each key object for each type of distance measure selected;
(d) for each triangle trie related to a distance measure selected by a user, pruning that triangle trie to produce a potentially matching set of data objects from which any data objects that cannot match said query object within the defined limit have been eliminated, thereby reducing a number of data objects in the potentially matching set that potentially will require direct comparisons with said query object; and
(e) directly comparing data objects from said set of data objects that have not yet been eliminated to identify any data object that matches the query data object within the defined limit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for identifying any database objects that substantially match a specified query object in a computationally efficient manner in which a number of direct comparisons required to identify any database objects that substantially match said query object is reduced, comprising the steps of:
-
(a) identifying a set of key objects in a database having a plurality of database objects;
(b) generating a plurality of relational vectors, such that for each database object, a relational vector describes at least one type of distance measure between the database object and each key object;
(c) generating a triangle trie for each different type of distance measure, each triangle trie having a number of levels that is less than a number of key objects in said set of key objects;
(d) enabling a user to select a query object and at least one type of distance measure that will be used to match a data object to said query object, and to define a degree of closeness with which matches between data objects and said query object are to be evaluated;
(e) determining a query relational vector for said query object for each distance measure, such that a query relational vector describes a distance measure between said query object and each key object for each type of distance measure selected by the user;
(f) for each triangle trie related to a distance measure selected by the user, pruning that triangle trie to eliminate from consideration any database objects that cannot match said query object within at least the degree of closeness defined by the user, thereby creating a subset of data objects in which a number of database objects that potentially will require direct comparisons with said query object has been reduced;
(g) for each data object in said subset of data objects, applying a triangle inequality technique to further reduce the number of data objects that potentially will require direct comparison with said query object; and
(h) directly comparing any database objects not yet eliminated from consideration with said query object to identify the database objects that match said query object within at least the degree of closeness defined by the user.
-
-
12. A method for reducing a number of direct comparisons required to identify any data object in a set of data objects that matches a query data object, said method comprising the steps of:
-
(a) determining a set of key objects and a set of relational vectors, such that for each data object a relational vector describes at least one type of distance measure between that data object and each key object;
(b) generating a triangle trie for each different type of distance measure provided, each triangle trie having a number of levels that is less than a number of key objects in said set of key objects;
(c) generating index tables including data from said set of relational vectors;
(d) enabling a user to select a query object and at least one type of distance measure that will be used to match a data object to said query object;
(e) in response to determining said query object, determining at least one query relational vector, such that a different query relational vector describes a distance measure between said query object and each key object for each type of distance measure selected;
(f) pruning each triangle trie that is related to a distance measure selected by a user to eliminate from further consideration any data objects from said set of data objects that cannot match said query object, thereby reducing a number of data objects that potentially will require direct comparisons with said query object;
(g) using the data included in said index tables and the query relational vectors, applying a triangle inequality technique to further reduce the number of data objects that potentially will require direct comparison with said query object; and
(h) directly comparing to said query object any data object remaining for direct comparison following the preceding two steps.
-
-
13. A method for identifying database records that are close matches to a specified query object in a computationally efficient manner, by reducing a number of direct comparisons required to identify any database record that closely matches said query object, said method comprising the steps of:
-
(a) providing a database comprising a plurality of database records;
(b) defining a first subset of said plurality of database records as key objects, such that the subset of key objects contains less than all of said database records;
(c) defining a plurality of distance measures, each distance measure corresponding to a quantifiable characteristic of a data type stored in said plurality of database records;
(d) generating a plurality of relational vector sets, such that for each different distance measure a relational vector set is generated, each relational vector set comprising a distance measure between each database record and each key object;
(e) generating a plurality of triangle tries, such that for each different distance measure a triangle trie is generated, each triangle trie having less than one level for each key object;
(f) enabling a user to select a query object, to select at least one of said plurality of distance measures for determining a match with a database record, and to define a degree of closeness by at least which identified database records should match said query object;
(g) generating at least one query relational vector set, such that for each different distance measure selected by a user a query relational vector set is generated, each query relational vector set comprising a distance measure between said query object and each key object;
(h) pruning each triangle trie associated with a distance measure selected by a user to eliminate from further consideration any database records that cannot match said query object within at least the degree of closeness defined by a user, to produce a second subset of database records, said second subset comprising any database records not yet eliminated from consideration, thereby reducing a number of database records that potentially could match said query object;
(i) applying a triangle inequality technique to each query relational vector set and each relational vector corresponding to a database record in said second subset of database records to eliminate from further consideration any database records that cannot match said query object within at least the degree of closeness defined by a user, thus generating a third subset of database records, said third subset comprising any database records not yet eliminated from consideration, thereby further reducing the number of data objects that potentially will require direct comparison with said query object; and
(j) directly comparing any database record in said third subset with said query object to identify the database records that match said query object within at least the degree of closeness defined by the user. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. An article of manufacture adapted for use with a computing device to enable a user to rapidly identify any data object in a set of data objects that matches a query data object to within at least a degree of closeness selectable by a user, by reducing a number of direct comparisons required to identify any matching data objects, comprising:
-
(a) a memory medium; and
(b) a plurality of machine instructions stored on the memory medium, said plurality of machine instructions, when executed by a processor in a computing device, causing the processor to;
(i) define a key object subset of said data objects;
(ii) determine a set of relational vectors, such that for each data object a relational vector describes at least one type of distance measure between that data object and each key object;
(iii) generate a triangle trie for each different type of distance measure provided, each triangle trie having a number of levels that is less than a number of key objects in said key object subset;
(iv) index said set of relational vectors, such that said set of relational vectors can be accessed rapidly;
(v) enable a user to select a query object, to select at least one type of distance measure that will be used to match a data object to said query object, and to select a degree of closeness by at least which identified data objects should match said query object;
(vi) determine at least one query relational vector for said query object, such that a query relational vector describes a distance measure between said query object and each key object for each type of distance measure selected;
(vii) prune each triangle trie related to a distance measure selected by a user to eliminate from further consideration any data objects from said set of data objects that cannot match said query object within at least said degree of closeness selected by a user, thereby reducing a number of data objects that potentially will require direct comparisons with said query object;
(viii) for each data object in said set of data objects that has not yet been eliminated, retrieving a corresponding relational vector from the index of said set of relational vectors, and employing a triangle inequality technique to further eliminate data objects that cannot match said query object to within at least said degree of closeness selected by a user; and
(ix) directly comparing to said query object any data object not yet eliminated from consideration, thereby identifying any data object in said set of data objects that matches said query data object to within at least said degree of closeness selected by a user. - View Dependent Claims (21, 22, 23)
-
-
24. A system for enabling a user to rapidly identify database records that are close matches to a specified query record in a computationally efficient manner, by reducing a number of direct comparisons required to identify any database record that closely matches said query record, comprising:
-
(a) a memory in which are stored;
(i) a plurality of machine instructions;
(ii) a database comprising a plurality of database records;
(iii) a set of key objects comprising a portion of said plurality of database records;
(iv) an indexed set of relational vectors, such that for each database record a relational vector describes at least one type of distance measure between that data object and each key object; and
(v) a triangle trie for each different type of distance measure provided, each triangle trie having a number of levels that is less than a number of key objects in said set of key objects; and
index tables describing said set of relational vectors;
(b) a display; and
(c) a processor that is coupled to the display and to the memory to access the machine instructions, said processor executing said machine instructions and implementing a plurality of functions, including;
(i) enabling a user to select a query object and to define how closely identified database records should match said query object;
(ii) generating at least one query relational vector set, such that for each different distance measure a query relational vector set is generated, each query relational vector set comprising a distance measure between said query object and each key object;
(iii) pruning each triangle trie to eliminate from further consideration any database records that cannot match said query object at least as closely as defined by a user, thereby reducing a number of database records that potentially could match said query object;
(iv) using the indexed plurality of relational vector sets, the at least one query relational vector set and a triangle inequality technique to eliminate from further consideration any database records not previously eliminated that cannot match said query object, thereby eliminating from further consideration any database records that cannot match said query object at least as closely as defined by a user; and
(v) directly comparing any database records not yet eliminated from consideration with said query object to identify any database records that match said query object at least as closely as defined by a user. - View Dependent Claims (25, 26, 27)
-
-
28. A method for reducing a number of direct comparisons required to identify any data object in a set of data objects that matches a query data object to within at least a specified degree of closeness, said method comprising the steps of:
-
(a) providing a set of data objects comprising a subset of key objects, a plurality of relational vectors such that for each data object, a relational vector describes at least one type of distance measure between that data object and each key object, at least one triangle trie for each different type of distance measure provided, each triangle trie having a number of levels that is less than a number of key objects in said set of key objects;
(b) enabling a user to select a query object, at least one type of distance measure that will be used to match a data object to said query object, and a degree of closeness used to match a data object to said query object;
(c) determining at least one query relational vector, such that a different query relational vector describes a distance measure between said query object and each key object for each type of distance measure selected;
(d) pruning each triangle trie that is related to a distance measure selected by a user, to eliminate from further consideration any data objects from said set of data objects that cannot match said query object within at least the degree of closeness selected by the user, thereby reducing a number of data objects that potentially will require direct comparisons with said query object;
(e) for each data object not yet eliminated, applying a triangle inequality technique to further reduce the number of data objects that potentially will require direct comparison with said query object; and
(f) directly comparing said query object to any data object remaining for direct comparison following the preceding two steps.
-
-
29. A method for reducing a number of direct comparisons required to identify any data object in a set of data objects that matches a query data object within at least a specified degree of closeness, said method comprising the steps of:
-
(a) pruning at least one triangle trie relating to said set of data objects to eliminate from further consideration any data objects from said set of data objects that cannot match said query object to within at least the specified degree of closeness, thereby reducing a number of data objects that potentially will require direct comparisons with said query object;
(b) for each data object not yet eliminated, applying a triangle inequality technique to further reduce the number of data objects that potentially will require direct comparison with said query object; and
(c) directly comparing said query object to any data object not eliminated in the preceding two steps to identify any data object that matches said query data object within at least said specified degree of closeness.
-
Specification