Method and system for creating frugal speech corpus using internet resources and conventional speech corpus
First Claim
1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:
- identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;
extracting a second speech data of at least one accessible encoding format from the first speech data;
extracting a second text transcription data with at least one encoding format from the first text transcription data;
matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;
analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and
conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech corpus creation method and system are disclosed. The method comprising identifying a publicly accessible first source of the first speech data and its corresponding first text transcription; extracting a second speech data of an accessible encoding format from the first speech data; extracting a second text transcription data with at least one encoding format from the first text transcription data; matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus; analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.
12 Citations
9 Claims
-
1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:
-
identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription; extracting a second speech data of at least one accessible encoding format from the first speech data; extracting a second text transcription data with at least one encoding format from the first text transcription data; matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus; analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech corpus creation system, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the system comprising:
-
a speech data extractor adapted to extract a second speech data of at least one encoding format from the first speech data; a text data extractor adapted to extract a second text transcription data of at least one encoding format from a first text transcription of the first speech data; a speech alignment module adapted to match and align the first text transcription to the corresponding extracted long speech data in the first speech data, at a sentence word level, or combination thereof to form a first and a second speech corpus; a phonetically balanced data extractor for analyzing the text transcriptions in the second speech corpus and to identify the short speech segments to form a phonetically balanced, segmented, text aligned third speech corpus; and a compensator means adapted to identify at least one contextual gap in the third speech corpus and to condition the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from the at least one second source to form a final speech corpus. - View Dependent Claims (7, 8, 9)
-
Specification