×

Method, apparatus and system for building a compact language model for large vocabulary continuous speech recognition (LVCSR) system

  • US 7,418,386 B2
  • Filed: 04/03/2001
  • Issued: 08/26/2008
  • Est. Priority Date: 04/03/2001
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method comprising:

  • digitizing input speech;

    converting the digitized input speech into a sequence of feature vectors;

    classifying a set of probabilistic attributes in an N-gram language model into a plurality of classes, the probabilistic attributes representing various conditional probabilities of a set of words in the language of the input speech;

    clustering each resultant class into a plurality of segments to build a codebook for the respective class using a modified K-means clustering process which dynamically adjusts the size and centroid of each segment during each iteration in the modified K-means clustering process;

    representing a probabilistic attribute in each class by the centroid of the corresponding segment to which the respective probabilistic attribute belongs; and

    applying the feature vectors to the probabilistic attributes to select a set of words corresponding to the input speech.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×