Indexing method and apparatus
First Claim
1. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, the apparatus comprising:
- a memory for storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
a memory for storing an index having a plurality of entries, each of which comprises;
(i) an identifier for identifying the entry;
(ii) a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
(iii) a number of pointers which point to portions of data in the database which correspond to the key for the entry;
wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
means for classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and for defining one or more sub-sequences of query sub-word unit classifications;
means for determining a corresponding identifier for an entry in said index for each of said one or more sub-sequences of query sub-word unit classifications;
means for comparing the key associated with each of the determined identifiers with the corresponding sub-sequence of query sub-word unit classifications; and
means for retrieving one or more pointers from said index in dependence upon the output of said comparing means, which one or more pointers identify said one or more portions of data in the database for comparison with the input query.
1 Assignment
0 Petitions
Accused Products
Abstract
An indexing apparatus and method are described for use in identifying portions of data in a database for comparison with a query. In an embodiment, the index includes a key which comprises a sequence of phoneme classifications derived from the input query by classifying each of the phonemes in the input query with a number of phoneme classes, with the phonemes in each class being defined as those that are confusable with the other phonemes in the same class.
-
Citations
33 Claims
-
1. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, the apparatus comprising:
-
a memory for storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
a memory for storing an index having a plurality of entries, each of which comprises;
(i) an identifier for identifying the entry;
(ii) a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
(iii) a number of pointers which point to portions of data in the database which correspond to the key for the entry;
wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
means for classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and for defining one or more sub-sequences of query sub-word unit classifications;
means for determining a corresponding identifier for an entry in said index for each of said one or more sub-sequences of query sub-word unit classifications;
means for comparing the key associated with each of the determined identifiers with the corresponding sub-sequence of query sub-word unit classifications; and
means for retrieving one or more pointers from said index in dependence upon the output of said comparing means, which one or more pointers identify said one or more portions of data in the database for comparison with the input query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
15. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of features, the apparatus comprising:
-
a memory for storing data defining a plurality of feature classes, each class comprising features that are confusable with other features in the same class;
a memory for storing an index having a plurality of entries, each of which comprises;
(i) an identifier for identifying the entry;
(ii) a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
(iii) a number of pointers which point to portions of data in the database which correspond to the key for the entry;
wherein each key comprises a sequence of feature classifications which is derived from a corresponding sequence of features appearing in the database by classifying each of the features in the sequence into one of the plurality of feature classes;
means for classifying each of the features in the input query into one of the plurality of feature classes and for defining one or more sub-sequences of query feature classifications;
means for determining a corresponding identifier for an entry in said index for each of said one or more sub-sequences of query feature classifications;
means for comparing the key associated with each of the determined identifiers with the corresponding sub-sequence of query feature classifications; and
means for retrieving one or more pointers from said index in dependence upon the output of said comparing means, which one or more pointers identify said one or more portions of data in the database for comparison with the input query.
-
-
16. Data defining an index for use in searching a database, the data comprising:
-
data defining a respective identifier for each of a plurality of entries in the index;
data defining a respective key for each of the plurality of entries, which keys are related to the corresponding identifiers in a predetermined manner; and
data defining a respective one or more pointers for a plurality of the entries, which pointers point to locations within the database corresponding to the key for the entry;
wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of a plurality of sub-word unit classes, the sub-word unit classes being defined in advance and each comprising sub-word units that are confusable with other sub-word units in the same class.
-
-
17. A method of identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, the method comprising the steps of:
-
storing data defining a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class;
storing an index having a plurality of entries, each of which comprises;
(i) an identifier for identifying the entry;
(ii) a key associated with the entry and which is related to the identifier for the entry in a predetermined manner; and
(iii) a number of pointers which point to portions of data in the database which correspond to the key for the entry;
wherein each key comprises a sequence of sub-word unit classifications which is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of the plurality of sub-word unit classes;
classifying each of the sub-word units in the input query into one of the plurality of sub-word unit classes and for defining one or more sub-sequences of query sub-word unit classifications;
determining a corresponding identifier for an entry in said index for each of said one or more sub-sequences of query sub-word unit classifications;
comparing the key associated with each of the determined identifiers with the corresponding sub-sequence of query sub-word unit classifications; and
retrieving one or more pointers from said index in dependence upon the output of said comparing step, which one or more pointers identify said one or more portions of data in the database for comparison with the input query.
-
-
33. An apparatus for identifying one or more portions of data in a database for comparison with a query input by a user, the query and the portions of data each comprising a sequence of sub-word units, the apparatus being characterised by an index having a plurality of entries, each of which includes a key comprising a sequence of sub-word unit classifications, which key is derived from a corresponding sequence of sub-word units appearing in the database by classifying each of the sub-word units in the sequence into one of a plurality of sub-word unit classes, each class comprising sub-word units that are confusable with other sub-word units in the same class.
Specification