×

Apparatuses and methods for developing and using models for speech recognition

  • US 5,715,367 A
  • Filed: 01/23/1995
  • Issued: 02/03/1998
  • Est. Priority Date: 01/23/1995
  • Status: Expired due to Term
First Claim
Patent Images

1. A computerized method for automatically creating models of speech sounds to be used in speech recognition comprising the steps of:

  • receiving training signals representing the sound of spoken words;

    storing a plurality of phonetic context units, each representing a speech sound in a phonetic context defined by one or more phonetic features, and associating with each phonetic context unit an initial acoustic model to represent its associated speech sound;

    time aligning successive time frames of the training signals against the initial models of the phonetic context units of the words corresponding to those training signals, to associate each frame with the phonetic context unit whose sound it represents;

    storing a set of classifications, each representing a possible set of one or more of the phonetic features which can be associated with one of said phonetic context units;

    using an automatic classification routine to select a plurality of sub-sets of said classifications which divide the phonetic context units into phonetic context groups, such that the phonetic context units in each such phonetic context group tend to be time aligned against acoustically similar frames;

    developing shared acoustic model components for a plurality of phonetic context groups whose associated sub-sets of classifications share a sub-sub-set of classifications and whose frames have a certain acoustic similarity, which shared acoustic model components contain statistical information derived from frames time aligned against the phonetic context units in different ones of said plurality of phonetic context groups; and

    developing an acoustic model for each given phonetic context group in said plurality of phonetic context groups which contains a combination of said statistical information contained in the model components shared by said plurality of groups and more specific statistical information representing the frames time aligned against the phonetic context units in the given individual phonetic context group.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×