Speech recognition system utilizing pre-calculated similarity measurements
First Claim
1. A method of preparing a compressed matrix of precalculated distance metrics for comparing an input utterance which is represented by a sequence of prototype data frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, said method comprising:
- generating an array of distance metrics for all combinations of prototype frames and prototype states;
for each state, identifying the frames for which the corresponding metric is meaningful;
for each state, determining a common default value for non-meaningful metrics;
for at least one group of frames corresponding to each state, determining the locations within said array containing meaningful metrics;
building a combined list of meaningful metrics by adding the meaningful metrics from successive groups of frames using an offset for each group which allows the meaningful metrics for each group to fit into currently unused positions in the list, the relative positions of meaningful metrics within each group being maintained in the list;
building an array of said offset values accessed by the corresponding state;
building an array distinguishing meaningful and non-meaningful entries in the original array of distance metrics;
whereby a measure of match between an input utterance and a vocabulary word model is obtainable by combining corresponding metrics using a default value for non-meaningful metrics and locating respective meaningful metrics in said combined list using said array of offset values.
11 Assignments
0 Petitions
Accused Products
Abstract
An input utterance is converted to a sequence of standard or prototype data frames which are compared with word models which are represented by respective sequences of standard or prototype probability states, there being a pre-calculable distance metric representing the degree of match between each prototype data frame and each prototype model state. Only distance measurements better than a calculated threshold are considered meaningful and those meaningful metrics are stored in a packed list. Also stored is an address array of offsets for locating particular meaningful metrics in the list, the address array being accessed by the corresponding frame and state indices. Also stored is an array for distinguishing meaningful and non-meaningful metrics. Accordingly, an input utterance can be evaluated by locating meaningful metrics in the packed list using the address array and by utilizing a default value for any non-meaningful metric.
16 Citations
20 Claims
-
1. A method of preparing a compressed matrix of precalculated distance metrics for comparing an input utterance which is represented by a sequence of prototype data frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, said method comprising:
-
generating an array of distance metrics for all combinations of prototype frames and prototype states; for each state, identifying the frames for which the corresponding metric is meaningful; for each state, determining a common default value for non-meaningful metrics; for at least one group of frames corresponding to each state, determining the locations within said array containing meaningful metrics; building a combined list of meaningful metrics by adding the meaningful metrics from successive groups of frames using an offset for each group which allows the meaningful metrics for each group to fit into currently unused positions in the list, the relative positions of meaningful metrics within each group being maintained in the list; building an array of said offset values accessed by the corresponding state; building an array distinguishing meaningful and non-meaningful entries in the original array of distance metrics; whereby a measure of match between an input utterance and a vocabulary word model is obtainable by combining corresponding metrics using a default value for non-meaningful metrics and locating respective meaningful metrics in said combined list using said array of offset values.
-
-
2. A method of preparing a compressed matrix of precalculated distance metrics for comparing an input utterance which is represented by a sequence of prototype data frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, said method comprising:
-
generating an array of distance metrics for all combinations of prototype frames and prototype states; for each state, identifying the frames for which the corresponding metric is meaningful; for each state, determining a common default value for non-meaningful metrics; for subgroups of frames corresponding to each state, determining the locations within said array containing meaningful metrics; building a combined list of meaningful metrics by adding the meaningful metrics from successive subgroups of frames using an offset for each subgroup which allows the meaningful metrics for each group to fit into currently unused positions in the list, the relative positions of meaningful metrics within each subgroup being maintained in the list; building an array of said offset values accessed by the corresponding subgroup and state; building an array distinguishing meaningful and non-meaningful entries in the original array of distance metrics; whereby a measure of match between an input utterance and a vocabulary word model is obtainable by combining corresponding metrics using a default value for non-meaningful metrics and locating respective meaningful metrics in said combined list using said array of offset values.
-
-
3. A method of preparing a compressed matrix of precalculated distance metrics for comparing an input utterance which is represented by a sequence of prototype data frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, said method comprising:
-
generating a matrix of distance metrics for all combinations of prototype frames and prototype states; for each state, order the metrics, best matches first; cumulating the metrics, best matches first, for each frame until a preselected threshold is reached; for each state, assigning a common default value for all metrics which did not contribute to the cumulation; for the frames corresponding to each state, determining the locations within said array containing contributing metrics, said metrics being then considered meaningful; building a combined list of meaningful metrics by adding the meaningful metrics from successive groups of contributing frames using an offset for each group which allows the meaningful metrics for each group to fit into currently unused positions in the list, the relative positions of meaningful metrics within each group being maintained in the list; building an array of said offset values accessed by the corresponding frame and state; building an array distinguishing meaningful and non-meaningful entries in the original array of distance metrics; whereby a measure of match between an input utterance and a vocabulary word model is obtainable by combining corresponding metrics using a default value for non-meaningful metrics and locating respective meaningful metrics in said combined list using said array of offset values. - View Dependent Claims (4, 5, 6, 7, 8, 9)
-
-
10. A method of preparing a compressed matrix of precalculated distance metrics for comparing an input utterance which is represented by a sequence of prototype data frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, said method comprising:
-
generating a matrix of distance metrics for all combinations of prototype frames and prototype states, locations within said matrix being addressable by respective state and frame multibit indices; for each state, order the metrics, best matches first; cumulating the metrics, best matches first, for each frame until a preselected threshold is reached; for each state, assigning a common default value for all metrics which did not contribute to the cumulation; for the frames corresponding to each state, determining the locations within said array containing contributing metrics, said metrics being then considered meaningful; dividing the group of frames corresponding to each state into subgroups identifiable by the higher order bits of the frame multibit index; building a combined list of meaningful metrics by adding the meaningful metrics from successive sub-groups of contributing frames, using an offset for each sub-group which allows the meaningful entries to fit into currently unused positions in the list, the relative positions of meaningful metrics within each sub-group being maintained in the list; building an array of said offset values accessed by the corresponding state index and the higher order bits of the frame multibit index; building a check list paralleling said combined list and containing entries distinguishing meaningful and non-meaningful entries in the original array of distance metrics;
whereby a measure of match between an input utterance and a vocabulary word model is obtainable by combining corresponding metrics using a default value for non-meaningful metrics and locating respective meaningful metrics in said combined list using said array of offset values. - View Dependent Claims (11)
-
-
12. A method of preparing a compressed matrix of precalculated distance metrics for comparing an input utterance which is represented by a sequence of prototype data frames selected from a preselected set of prototype data frames with at least some of a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, said method comprising:
-
generating an array of distance metrics for all combinations of prototype frames and prototype states; designating as meaningful metrics which represent a probability better than a preselected criteria, said criteria being selected to produce meaningful metrics in less than about twenty percent of said entries thereby to generate a sparse matrix of meaningful entries; generating a data structure identifying which metrics in said matrix have been designated as meaningful; generating at least one default value for non-meaningful metrics, each default value being common to a substantial number of said non-meaningful metrics; and compressing said sparse matrix into a packed list comprised essentially entirely of said meaningful entries. - View Dependent Claims (13)
-
-
14. Speech recognition apparatus comprising:
-
means for converting an input utterance to a sequence of utterance data frames; means for converting each of said utterance data frames to a closely matching prototype data frame selected from a preselected set of prototype data frames thereby to obtain a corresponding sequence of prototype data frames; means for storing a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, there being a precalcuable distance metric for each possible pairing of prototype data frame and prototype state, distance metrics better than a calculated value being designated as meaningful; means for storing a list of only the precalculated metrics which are designated as meaningful, an array distinguishing the meaningful and non-meaningful metrics, and an address array of offsets for locating particular meaningful metrics in said list, said address array being accessed by the corresponding frame and state; means for comparing an input utterance with at least some of said word models including a combining of the respective metrics, wherein said comparing means utilizes said distinguishing array to determine if a corresponding metric is meaningful or non-meaningful and utilizes, in the combining, a default value for any non-meaningful metric; and
locates corresponding meaningful metrics in said list using said address array. - View Dependent Claims (15, 16)
-
-
17. Speech recognition apparatus comprising:
-
means for converting an input utterance to a sequence of utterance data frames; means for converting each of said utterance data frames to a closely matching prototype data frame selected from a preselected set of prototype data frames thereby to obtain a corresponding sequence of prototype data frames; means for storing a vocabulary of word models each of which is represented by a sequence of prototype states selected from a preselected set of prototype states, there being a precalcuable distance metric for each possible pairing of prototype data frame and prototype state, distance metrics better than a preselected criteria being designated as meaningful; means for storing a packed list of only the precalculated metrics which are designated as meaningful, a check list paralleling said packed list and containing entries distinguishing meaningful and non-meaningful metrics, a list of default values for the respective states, and an address array of offsets for locating particular meaningful metrics in said list, said address array being accessed by the corresponding frame and state; means for comparing an input utterance with at least some of said word models including a combining of the respective metrics, wherein said comparing means utilizes said distinguishing array to determine if a corresponding metric is meaningful or non-meaningful and utilizes, in the combining, a respective default value for any non-meaningful metric; and
locates corresponding meaningful metrics in said list using said address array. - View Dependent Claims (18, 19, 20)
-
Specification