×

Generating speech data collection prompts

  • US 8,700,396 B1
  • Filed: 10/08/2012
  • Issued: 04/15/2014
  • Est. Priority Date: 09/11/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving, at a computer system, a request to generate a textual prompt to provide to a user for generating speech data in a particular language;

    in response to receiving the request, determining frequencies of occurrence of linguistic features of the particular language in one or more corpora that are associated with the particular language, wherein the one or more corpora include content that was generated by people who use the particular language and that reflects current use of the particular language;

    identifying, by the computer system, quantities of speech samples that include the linguistic features from a repository of previously recorded speech samples;

    weighting the frequencies of occurrence of the linguistic features based on the quantities of speech samples that include the linguistic features, wherein the weighting generates weighted frequencies for the linguistic features, wherein a first linguistic feature is determined to have a weighted frequency that is greater than a weighted frequency for a second linguistic feature as a result of the computer system executing computer code that includes both of the following conditions and determining that one or more of the following conditions are satisfied;

    (i) the first linguistic feature has a same or greater frequency of occurrence in the one or more corpora and has fewer speech samples in the repository of previously recorded speech samples than the second linguistic feature, and(ii) the first linguistic feature has a greater frequency of occurrence in the one or more corpora and has the same or fewer speech samples in the repository of previously recorded speech samples than the second linguistic feature;

    generating, by the computer system, one or more textual prompts based on the weighted frequencies for the linguistic features, wherein each of the one or more textual prompts comprises a combination of two or more of the linguistic features; and

    providing, by the computer system, the generated one or more textual prompts.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×