System and method for generating a phonetic baseform for a word and using the generated baseform for speech recognition

US 6,389,395 B1
Filed: 04/04/1997
Issued: 05/14/2002
Est. Priority Date: 11/01/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:

receiving an input speech signal representing a word;

deriving feature samples from the received speech signal;

comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;

identifying, using an algorithm that is employed by a subsequent recognizer, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and

storing the generated coded representation of the word for subsequent recognition of another speech signal.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Out-of-vocabulary word models for a speech recognizer vocabulary are generated by forming phonemic transcriptions (phonetic baseforms) of user'"'"'s utterances in terms of existing reference phonemes by using a speech recognition algorithm to match input sub-word feature sample sequences to suitably-constrained allowable sequences of existing reference phoneme features. The resultant new-vocabulary-word phonetic baseform models are stored for subsequent speech recognition using the same recognition algorithm.

Citations

20 Claims

1. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
- receiving an input speech signal representing a word;
  
  deriving feature samples from the received speech signal;
  
  comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
  
  identifying, using an algorithm that is employed by a subsequent recognizer, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
  
  storing the generated coded representation of the word for subsequent recognition of another speech signal.
- View Dependent Claims (2)
- - 2. The method of claim 1, wherein the reference sub-word models are Hidden Markov Models.

3. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
- receiving an input speech signal representing a word;
  
  deriving feature samples from the received speech signal;
  
  comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
  
  identifying, using a Viterbi algorithm, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
  
  storing the generated coded representation of the word for subsequent recognition of another speech signal using a Viterbi algorithm.
- View Dependent Claims (4, 5, 6, 7, 8, 9)
- - 4. The method according to claim 3, wherein all possible sequences of the reference sub-word representations are allowable.
  - 5. The method according to claim 3, wherein the allowable sequences of sub-word representations are constrained to sequences that comprise sub-word representations that represent noise followed by sub-word representations that represent speech followed by sub-word representations that represent noise.
  - 6. The method according to claim 3, wherein the step of identifying includes consideration of stored parameters each representing a transition probability of a sub-word representation following a previous sub-word representation.
  - 7. The method according to claim 3, further comprising the step of generating a recognition network from one or more stored sub-word representations, said network representing allowable sequences of sub-word representations in the generated vocabulary.
  - 8. The method according to claim 3, wherein the sub-word representations are statistical models.
  - 9. The method according to claim 8, wherein the sub-word representations are Hidden Markov Models.

10. An apparatus for generating a vocabulary from an input speech signal, comprising:
- a store containing a plurality of reference sub-word representations;
  
  a feature deriver for receiving the input speech signal and operable to generate feature samples;
  
  a recognizer connected to receive the generated feature samples, the recognizer having a vocabulary of allowable sequences of sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
  
  the recognizer being arranged in operation to employ a Viterbi algorithm to;
  
  compare the received feature samples with the allowable sequences of reference sub-word representations; and
  
  generate a coded representation by identifying an allowable sequence of reference sub-word representations that most closely resembles the input speech signal; and
  
  a first store for storing the coded representation of the input speech signal for subsequent recognition of another speech signal using a Viterbi algorithm.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The apparatus according to claim 10, further comprising:
    - a recognizer arranged to compare input speech signals with coded representations in the first store and to output a signal indicative of recognition.
  - 12. The apparatus according to claim 11, further comprising:
    - a second store of coded representations of words, said coded representations having been generated in a manner different from the coded representations stored in the first store.
  - 13. The apparatus according to claim 12, wherein the coded representations of words identify a sequence of the reference sub-word representations.
  - 14. The apparatus according to claim 10, wherein the vocabulary defines sequences of sub-word representations that comprise sub-word representations that represent noise followed by sub-word representations that represent speech followed by sub-word representations that represent noise.
  - 15. The apparatus according to claim 10, wherein the vocabulary defines all possible sequences of sub-word representations.
  - 16. The apparatus according to claim 10, wherein generation of the allowable sequence of reference sub-word representations that most closely resembles the received speech signal includes consideration of stored parameters each representing a transition probability of a sub-word representation following a previous sub-word representation.
  - 17. The apparatus according to claim 10, wherein the sub-word representations are statistical models.
  - 18. The apparatus according to claim 17, wherein the sub-word representations are Hidden Markov Models.

19. A method for generating a vocabulary for a speech recognition apparatus comprising the steps of:
- receiving an input speech signal representing a word;
  
  deriving feature samples from the received speech signal;
  
  comparing the feature samples with allowable sequences of reference sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
  
  identifying, using a Viterbi algorithm which imposes transitional probabilities between each pair of sub-words, the allowable sequence of reference sub-word representations that most closely resembles the received speech signal, and generating a coded representation from said identified representations; and
  
  storing the generated coded representation of the word for subsequent recognition of another speech signal using a Viterbi algorithm.

20. An apparatus for generating a vocabulary from an input speech signal, comprising:
- a store containing a plurality of reference sub-word representations;
  
  a feature deriver for receiving the input speech signal and operable to generate feature samples;
  
  a recognizer connected to receive the generated feature samples, the recognizer having a vocabulary of allowable sequences of sub-word representations, at least one of said sub-word representations being capable of representing a sequence of more than one feature sample;
  
  the recognizer being arranged in operation to employ a Viterbi algorithm to;
  
  compare the received feature samples with the allowable sequences of reference sub-word representations; and
  
  generate a coded representation by identifying an allowable sequence of reference sub-word representations that most closely resembles the input speech signal, taking into account transition probabilities between pairs of sub-words; and
  
  a first store for storing the coded representation of the input speech signal for subsequent recognition of another speech signal using a Viterbi algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Ringland, Simon P.
Primary Examiner(s)
SMITS, TALIVALDIS IVARS

Application Number

US08/817,072
Time in Patent Office

1,866 Days
Field of Search

704/250, 704/251, 704/256, 704/276, 704/241, 704/243, 704/244, 704/245, 704/270, 704/275, 704/242, 704/254
US Class Current

704/254
CPC Class Codes

G10L 15/063 Training

G10L 2015/025 Phonemes, fenemes or fenone...

System and method for generating a phonetic baseform for a word and using the generated baseform for speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for generating a phonetic baseform for a word and using the generated baseform for speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links