Indexing method and apparatus
First Claim
Patent Images
1. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, said apparatus comprising:
- a memory for storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
a memory for storing an index having a plurality of entries, each entry having an associated identifier for identifying the entry and each entry comprising;
a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
a number of pointers which point to portions of data in the database which correspond to the key associated with the entry, wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
means for classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and for defining one or more sub-sequences of query sub-word unit classifications;
means for determining a corresponding identifier for an entry in the index for each of the one or more sub-sequences of query sub-word unit classifications;
means for comparing the key associated with each of the determined identifiers determined by said determining means with the corresponding sub-sequence of query sub-word unit classifications; and
means for retrieving one or more pointers from the index in accordance with the output of said comparing means, which one or more pointers identify the one or more portions of data in the database for comparison with the input query.
1 Assignment
0 Petitions
Accused Products
Abstract
An indexing apparatus and method are described for use in identifying portions of data in a database for comparison with a query. In an embodiment, the index includes a key which comprises a sequence of phoneme classifications derived from the input query by classifying each of the phonemes in the input query with a number of phoneme classes, with the phonemes in each class being defined as those that are confusable with the other phonemes in the same class.
-
Citations
48 Claims
-
1. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, said apparatus comprising:
-
a memory for storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
a memory for storing an index having a plurality of entries, each entry having an associated identifier for identifying the entry and each entry comprising;
a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
a number of pointers which point to portions of data in the database which correspond to the key associated with the entry, wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
means for classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and for defining one or more sub-sequences of query sub-word unit classifications;
means for determining a corresponding identifier for an entry in the index for each of the one or more sub-sequences of query sub-word unit classifications;
means for comparing the key associated with each of the determined identifiers determined by said determining means with the corresponding sub-sequence of query sub-word unit classifications; and
means for retrieving one or more pointers from the index in accordance with the output of said comparing means, which one or more pointers identify the one or more portions of data in the database for comparison with the input query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of features, said apparatus comprising:
-
a memory for storing data defining a plurality of feature classes, each class comprising features that are confusable with other features in the same class;
a memory for storing an index having a plurality of entries, each entry having an associated identifier for identifying the entry and each entry comprising;
a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
a number of pointers which point to portions of data in the database which correspond to the key associated with the entry, wherein each key comprises a sequence of feature classifications which is derived from a corresponding sequence of features appearing in the database by classifying each of the features in the sequence into one of the plurality of feature classes;
means for classifying each of the features in the input query into one of the plurality of feature classes and for defining one or more sub-sequences of query feature classifications;
means for determining a corresponding identifier for an entry in the index for each of the one or more sub-sequences of query feature classifications;
means for comparing the key associated with each of the determined identifiers determined by said determining means with the corresponding sub-sequence of query feature classifications; and
means for retrieving one or more pointers from the index in accordance with the output of said comparing means, which one or more pointers identify the one or more portions of data in the database for comparison with the input query.
-
-
17. A method of identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, the method comprising the steps of:
-
storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
storing an index having a plurality of entries, each entry having an associated identifier for identifying the entry, a key associated with the entry and which is related to the identifier for the entry in a predetermined manner, and a number of pointers which point to portions of data in the database which correspond to the key associated with the entry. wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and for defining one or more sub-sequences of query sub-word unit classifications;
determining a corresponding identifier for an entry in the index for each of the one or more sub-sequences of query sub-word unit classifications;
comparing the key associated with each of the determined identifiers determined in said determining step with the corresponding sub-sequence of query sub-word unit classifications; and
retrieving one or more pointers from the index in accordance with the output of said comparing step, which one or more pointers identify the one or more portions of data in the database for comparison with the input query. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, the apparatus comprising:
-
a first memory operable to store data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
a second memory operable to more an index having a plurality of entries, each entry having an associated identifier for identifying the entry and each entry comprising;
a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
a number of pointers which point to portions of data in the database which correspond to the key for the entry;
wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
a classifier operable to classify each of the sub-word units in the input query into one of the plurality of sub-word unit classes and to define one or more sub-sequences of query sub-word unit classifications;
a determiner operable to determine a corresponding identifier for an entry in the index for each of the one or more sub-sequences of query sub-word unit classifications;
a comparator operable to compare the key associated with each of the determined identifiers determined by said determiner with the corresponding sub-sequence of query sub-word unit classifications; and
a retriever operable to retrieve one or more pointers from the index in accordance with the output of said comparator, which one or more pointers identify the one or more portions of data in the database for comparison with the input query. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of features, said apparatus comprising:
-
a first memory operable to store data defining a plurality of feature classes, each class comprising features that are confusable with other features in the same class;
a second memory operable to store an index having a plurality of entries, each entry having an associated identifier for identifying the entry and each entry comprising;
a key associated with the entry and which is related to the identifier for to entry in a predetermined manner; and
a number of pointers which point to portions of data in the database which correspond to the key for the entry, wherein each key comprises a sequence of feature classifications which is derived from a corresponding sequence of features appearing in the database by classifying each of the features in the sequence into one of the plurality of feature classes;
a classifier operable to classify each of the features in the input query into one of the plurality of feature classes and to define one or more sub-sequences of query feature classifications;
a determiner operable to determine a corresponding identifier for an entry in said index for each of said one or more sub-sequences of query feature classifications;
a comparator operable to compare the key associated with each of the determined identifiers determined by said determiner with the corresponding sub-sequence of query feature classifications; and
a retriever operable to retrieve one or more pointers from the index in accordance with the output of said comparator, which one or more pointers identify the one or more portions of data in the database for comparison with the input query.
-
-
48. A storage medium storing computer readable program code for executing a method of controlling a processor to identify one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, said program code comprising:
-
code for storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
code for storing an index having a plurality of entries, each entry having an associated identifier for identifying the entry and each entry comprising;
a key associated with the entry and which is related to the identifier for the entry in a predetermined manner, and a number of pointers which point to portions of data in the database which correspond to the key for the entry, wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
code for classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and defining one or more sub-sequences of query sub-word unit classifications;
code for determining a corresponding identifier for an entry in the index for each of the one or more sub-sequences of query sub-word unit classifications;
code for comparing the key associated with each of the determined identifiers determined by said determining code with the corresponding sub-sequence of query sub-word unit classifications; and
code for retrieving one or more pointers from the index in accordance with the output by said comparing code, which one or more pointers identify the one or more portion of data in the database for comparison with the input query.
-
Specification