×

Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition

  • US 5,195,167 A
  • Filed: 04/17/1992
  • Issued: 03/16/1993
  • Est. Priority Date: 01/23/1990
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method of automatically grouping utterances of a phoneme into similar categories and correlating the groups of utterances with different contexts, said method comprising the steps of:

  • providing a training script comprising a series of phonemes, said training script comprising a plurality of occurrences of a selected phoneme, each occurrence of the selected phoneme having a context of one or more other phonemes preceding or following the selected phoneme in the training script;

    measuring the value of an acoustic feature of an utterance of the phonemes in the training script during each of a series of time intervals to produce a series of acoustic feature vector signals representing the acoustic feature values of the utterance, each acoustic feature vector signal corresponding to an occurrence of a phoneme in the training script;

    selecting a pair of first and second subsets of the set of occurrences of the selected phoneme in the training script, each occurrence of the selected phoneme in the first subset having a first context, each occurrence of the selected phoneme in the second subset having a second context different from the first context;

    selecting a pair of third and fourth subsets of the set of occurrences of the selected phoneme in the training script, each occurrence of the selected phoneme in the third subset having a third context different from the first and second contexts, each occurrence of the selected phoneme in the fourth subset having a fourth context different from the first, second, and third contexts;

    for each pair of subsets, determining the similarity of the acoustic feature values of the acoustic feature vector signals corresponding to the occurrences of the selected phoneme in one subset of the pair, and determining the similarity of the acoustic feature values of the acoustic feature vector signals corresponding to the occurrences of the selected phoneme in the other subset of the pair, the combined similarities for both subsets in the pair being a "goodness of fit" which estimates how well the contexts of the selected phoneme explain variations in the acoustic feature values of the utterances of the selected phoneme;

    identifying first and second best contexts associated with the pair of subsets having the best "goodness of fit"; and

    grouping the utterances of the selected phoneme into a first output set of utterances of the selected phoneme having the first best context, and grouping the utterances of the selected phoneme into a second output set of utterances of the selected phoneme having the second best context.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×