×

Sampling training data for an automatic speech recognition system based on a benchmark classification distribution

  • US 9,202,461 B2
  • Filed: 01/18/2013
  • Issued: 12/01/2015
  • Est. Priority Date: 04/26/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • obtaining a benchmark classification distribution of topic classifications for benchmark text strings;

    selecting, by a computing device, training text strings from a corpus of text strings, wherein the training text strings are associated with respective topic classifications, and wherein selecting the training text strings includes (a) determining to select t training text strings, (b) determining that a frequency of topic i in the benchmark classification distribution is B(i), wherein B(i) is inclusively between 0 and 1, (c) determining that a number of text strings classified with topic i in the corpus of text strings is N(i), and (d) selecting a training text string of topic i from the corpus of text strings based on probability t×

    B(i)/N(i); and

    training a language model of an automatic speech recognition (ASR) system using the training text strings.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×