×

Method and system for creating frugal speech corpus using internet resources and conventional speech corpus

  • US 8,756,064 B2
  • Filed: 06/26/2012
  • Issued: 06/17/2014
  • Est. Priority Date: 07/28/2011
  • Status: Active Grant
First Claim
Patent Images

1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:

  • identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;

    extracting a second speech data of at least one accessible encoding format from the first speech data;

    extracting a second text transcription data with at least one encoding format from the first text transcription data;

    matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;

    analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and

    conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×