Synthesizing word baseforms used in speech recognition
First Claim
1. A method of synthesizing word baseforms for words wherein each synthesized baseform represents a series of models from a first set of models, each model of the first set corresponding to an output-related model wherein each output-related model correlates to an output generable by an acoustic processor, the method comprising the steps of:
- (a) forming words as respective sequences of models from a second set of models, each model of the second set corresponding to a phonetic model;
(b) for a second set model occurring in a given context of second set models in step (a), storing a respective string of first set models;
(c) performing step (b) for each of at least one second set model; and
(d) constructing a word baseform of first set models for a word formed in step (a), including the step of representing each piece of a word that corresponds to a second set model in a given context by the stored respective string of first set models, if any, corresponding thereto.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus and method for synthesizing word baseforms for words not spoken during a training session, wherein each synthesized baseform represents a series of models from a first set of models, which include: (a) uttering speech during a training session and representing the uttered speech as a sequence of models from a second set of models; (b) for each of at least some of the second set models spoken in a given phonetic model context during the training session, storing a respective string of first set models; and (c) constructing a word baseform of first set models for a word not spoken during the training session, including the step of representing each piece of a word that corresponds to a second set model in a given context by the stored respective string, if any, corresponding thereto.
-
Citations
19 Claims
-
1. A method of synthesizing word baseforms for words wherein each synthesized baseform represents a series of models from a first set of models, each model of the first set corresponding to an output-related model wherein each output-related model correlates to an output generable by an acoustic processor, the method comprising the steps of:
-
(a) forming words as respective sequences of models from a second set of models, each model of the second set corresponding to a phonetic model; (b) for a second set model occurring in a given context of second set models in step (a), storing a respective string of first set models; (c) performing step (b) for each of at least one second set model; and (d) constructing a word baseform of first set models for a word formed in step (a), including the step of representing each piece of a word that corresponds to a second set model in a given context by the stored respective string of first set models, if any, corresponding thereto.
-
-
2. A method of synthesizing word baseforms for words not spoken during a training session, wherein each synthesized baseform represents a series of output-related models and wherein each output-related model correlates to an output generatable by an acoustic processor, the method comprising the steps of:
-
(a) representing each of N words by a respective sequence of phonetic models, the positioning of a subject phonetic model relative to other phonetic models forming a phonetic context for the subject phonetic model; (b) representing M words spoken during a training session by a series of output-related models, where the M words form a subset of the N words; (c) for at least one subject word, aligning the phonetic models for the subject word against the output-related models for the subject word, the subject word having been spoken during the training session; (d) from the alignment of output-related models and phonetic models, associating a string of output-related models with each of at least one respective phonetic model in a given context; and (e) constructing an output-related model baseform for a word not spoken during the training session including the steps of (i) correlating a piece of said word not spoken during the training session to a phonetic model in a defined context;
(ii) determining if the phonetic model in said defined context corresponds to a similarly contexted phonetic model that has an associated string of output-related models; and
(iii) representing said word piece by said associated string. - View Dependent Claims (3, 4, 5)
-
-
6. A method of synthesizing word baseforms for words not spoken during a training session, wherein each synthesized baseform represents a series of fenemic models and wherein each fenemic model correlates to a label output generatable by an acoustic processor, the method comprising the steps of:
-
(a) uttering known words during a training session, each known word uttered during the training session being formed of a known sequence of known word phonetic models; (b) determining a context for a subject known word phonetic model based on the positioning of other phonetic models proximate to said subject phonetic model; (c) repeating step (b) for one known word phonetic model after another as said subject phonetic model; (d) representing at least some of the known words by a series of fenemic models; (e) for a subject known word, aligning the phonetic models therefor against fenemic models representing said subject word; (f) repeating step (e) for each of a plurality of subject words; (g) associating a string of fenemic models with a respective known word phonetic model in a given context; and (h) constructing a fenemic baseform for a new word not spoken during the training session including the steps of (i) correlating a piece of said new word not spoken during the training session with a new word phonetic model in a defined context;
(ii) determining if the new word phonetic model in said defined context corresponds to a similarly contexted known word phonetic model that has an associated string of fenemic models; and
(iii) representing said word piece by said associated string. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
-
13. In a speech recognition system which has a known phonetic baseform for each word, apparatus for synthesizing fenemic word baseforms for words not uttered during a training session, wherein each synthesized baseform represents a series of fenemic models based on label outputs generatable by an acoustic processor, the apparatus comprising:
-
(a) means for storing information identifying phonetic models and respective phonetic model contexts occurring during the training session; (b) means for storing a respective string of fenemic models for a stored phonetic model in a given context occurring during the training session; and (c) means for constructing a word baseform of fenemic models for a new word not spoken during the training session, which includes; means for determining, for each new word phonetic model corresponding to a piece of the new word, a stored phonetic model and context which at least partially corresponds to the new word phonetic model in context; and means for assigning to each piece of the new word at least a portion of the fenemic string stored for the determined stored phonetic model and context therefor.
-
-
14. A method of producing a synthesized word baseform for a first word not spoken during a training session, said synthesized baseform comprising a series of models from a first set of models, each model in the first set of models representing a unit of speech, each unit of speech having a size, said method comprising the steps of:
-
forming an intermediate word baseform of the first word, said intermediate word baseform comprising at least a second model from a second set of models, said second model having a context in the intermediate word baseform, each model in the second set representing a unit of speech different from the units of speech represented by the models of the first set of models, each unit of speech represented by the models of the second set having a size larger than the size of the largest unit of speech represented by a model of the first set; correlating the second model, in its context in the intermediate word baseform, with a first series of models from the first set of models; and replacing the second model in the intermediate word baseform with the first series of models to produce a synthesized word baseform. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification