FRUGAL METHOD AND SYSTEM FOR CREATING SPEECH CORPUS
First Claim
1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:
- identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;
extracting a second speech data of at least one accessible encoding format from the first speech data;
extracting a second text transcription data with at least one encoding format from the first text transcription data;
matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;
analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and
conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a frugal method for extraction of speech data and associated transcription from plurality of web resources (internet) for speech corpus creation characterized by an automation of the speech corpus creation and cost reduction. An integration of existing speech corpus with extracted speech data and its transcription from the web resources to build an aggregated rich speech corpus that are effective and easy to adapt for generating acoustic and language models for (Automatic Speech Recognition) ASR systems.
57 Citations
9 Claims
-
1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:
-
identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription; extracting a second speech data of at least one accessible encoding format from the first speech data; extracting a second text transcription data with at least one encoding format from the first text transcription data; matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus; analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech corpus creation system, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the system comprising:
-
a speech data extractor adapted to extract a second speech data of at least one encoding format from the first speech data; a text data extractor adapted to extract a second text transcription data of at least one encoding format from a first text transcription of the first speech data; a speech alignment module adapted to match and align the first text transcription to the corresponding extracted long speech data in the first speech data, at a sentence word level, or combination thereof to form a first and a second speech corpus; a phonetically balanced data extractor for analyzing the text transcriptions in the second speech corpus and to identify the short speech segments to form a phonetically balanced, segmented, text aligned third speech corpus; and a compensator means adapted to identify at least one contextual gap in the third speech corpus and to condition the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from the at least one second source to form a final speech corpus. - View Dependent Claims (7, 8, 9)
-
Specification