System and method for generating a phonetic baseform for a word and using the generated baseform for speech recognition
First Claim
Patent Images
1. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
- receiving an input speech signal representing a word;
deriving feature samples from the received speech signal;
comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
identifying, using an algorithm that is employed by a subsequent recognizer, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
storing the generated coded representation of the word for subsequent recognition of another speech signal.
4 Assignments
0 Petitions
Accused Products
Abstract
Out-of-vocabulary word models for a speech recognizer vocabulary are generated by forming phonemic transcriptions (phonetic baseforms) of user'"'"'s utterances in terms of existing reference phonemes by using a speech recognition algorithm to match input sub-word feature sample sequences to suitably-constrained allowable sequences of existing reference phoneme features. The resultant new-vocabulary-word phonetic baseform models are stored for subsequent speech recognition using the same recognition algorithm.
-
Citations
20 Claims
-
1. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
-
receiving an input speech signal representing a word;
deriving feature samples from the received speech signal;
comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
identifying, using an algorithm that is employed by a subsequent recognizer, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
storing the generated coded representation of the word for subsequent recognition of another speech signal. - View Dependent Claims (2)
-
-
3. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
-
receiving an input speech signal representing a word;
deriving feature samples from the received speech signal;
comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
identifying, using a Viterbi algorithm, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
storing the generated coded representation of the word for subsequent recognition of another speech signal using a Viterbi algorithm. - View Dependent Claims (4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for generating a vocabulary from an input speech signal, comprising:
-
a store containing a plurality of reference sub-word representations;
a feature deriver for receiving the input speech signal and operable to generate feature samples;
a recognizer connected to receive the generated feature samples, the recognizer having a vocabulary of allowable sequences of sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
the recognizer being arranged in operation to employ a Viterbi algorithm to;
compare the received feature samples with the allowable sequences of reference sub-word representations; and
generate a coded representation by identifying an allowable sequence of reference sub-word representations that most closely resembles the input speech signal; and
a first store for storing the coded representation of the input speech signal for subsequent recognition of another speech signal using a Viterbi algorithm. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
-
receiving an input speech signal representing a word;
deriving feature samples from the received speech signal;
comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
identifying, using a Viterbi algorithm which imposes transitional probabilities between each pair of sub-words, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
storing the generated coded representation of the word for subsequent recognition of another speech signal using a Viterbi algorithm.
-
-
20. An apparatus for generating a vocabulary from an input speech signal, comprising:
-
a store containing a plurality of reference sub-word representations;
a feature deriver for receiving the input speech signal and operable to generate feature samples;
a recognizer connected to receive the generated feature samples, the recognizer having a vocabulary of allowable sequences of sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
the recognizer being arranged in operation to employ a Viterbi algorithm to;
compare the received feature samples with the allowable sequences of reference sub-word representations; and
generate a coded representation by identifying an allowable sequence of reference sub-word representations that most closely resembles the input speech signal, taking into account transition probabilities between pairs of sub-words; and
a first store for storing the coded representation of the input speech signal for subsequent recognition of another speech signal using a Viterbi algorithm.
-
Specification