FRUGAL METHOD AND SYSTEM FOR CREATING SPEECH CORPUS

US 20130030810A1
Filed: 06/26/2012
Published: 01/31/2013
Est. Priority Date: 07/28/2011
Status: Active Grant

First Claim

Patent Images

1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:

identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;

extracting a second speech data of at least one accessible encoding format from the first speech data;

extracting a second text transcription data with at least one encoding format from the first text transcription data;

matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;

analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and

conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a frugal method for extraction of speech data and associated transcription from plurality of web resources (internet) for speech corpus creation characterized by an automation of the speech corpus creation and cost reduction. An integration of existing speech corpus with extracted speech data and its transcription from the web resources to build an aggregated rich speech corpus that are effective and easy to adapt for generating acoustic and language models for (Automatic Speech Recognition) ASR systems.

57 Citations

View as Search Results

9 Claims

1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:
- identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;
  
  extracting a second speech data of at least one accessible encoding format from the first speech data;
  
  extracting a second text transcription data with at least one encoding format from the first text transcription data;
  
  matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;
  
  analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and
  
  conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as claimed in claim 1, wherein speech data extractor engine extracts the first speech data along with transcription thereof from publicly accessible sources that are relevant to a desired corpus.
  - 3. A method as claimed in claim 1, wherein while aligning the speech, a two stage alignment process is carried out comprising a first level syllable matching adapted to match the extracted transcription to the corresponding extracted long speech data at a syllable level and a subsequent second level matching adapted to align, using automatic speech recognition engine, short speech segments of the long syllable aligned speech data at sentence, word or phoneme level.
  - 4. A method as claimed in claim 1, wherein the matching and aligning comprising steps of:
    - detecting plurality of syllables in the second speech data;
      
      detecting plurality of syllables in the second text transcription data by employing a text syllable annotator;
      
      annotating and indexing each detected syllable in the second speech data and in the second text transcription data;
      
      aligning the syllable annotated second speech data with the syllable annotated second text data by matching the corresponding syllable indexes, to form a first syllable aligned speech corpus;
      
      segmenting the said first aligned corpus into plurality of short speech segments of uniform length; and
      
      aligning each short segment with the corresponding exacted text transcription to form a segmented text aligned second speech corpus, featuring alignment at sentence, word or phoneme level.
  - 5. A method as claimed in claim 1, wherein the third speech corpus, derived from a public source of speech data and its transcription, is conditioned with a context and associated environment richer corpus, collected using traditional procedure, to form the final speech corpus;

6. A speech corpus creation system, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the system comprising:
- a speech data extractor adapted to extract a second speech data of at least one encoding format from the first speech data;
  
  a text data extractor adapted to extract a second text transcription data of at least one encoding format from a first text transcription of the first speech data;
  
  a speech alignment module adapted to match and align the first text transcription to the corresponding extracted long speech data in the first speech data, at a sentence word level, or combination thereof to form a first and a second speech corpus;
  
  a phonetically balanced data extractor for analyzing the text transcriptions in the second speech corpus and to identify the short speech segments to form a phonetically balanced, segmented, text aligned third speech corpus; and
  
  a compensator means adapted to identify at least one contextual gap in the third speech corpus and to condition the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from the at least one second source to form a final speech corpus.
- View Dependent Claims (7, 8, 9)
- - 7. A system as claimed in claim 6, wherein speech data extractor engine extracts the first speech data along with transcription thereof from publicly accessible sources that are relevant to a desired corpus domain.
  - 8. A system as claimed in claim 6, wherein while aligning the speech, a two stage alignment is carried out comprising a first level syllable matching adapted to match the extracted transcription to the corresponding extracted long speech data at a syllable level and a subsequent second level of matching adapted to align, using automatic speech recognition, short speech segments of the syllable aligned long speech data, at atleast sentence, word and phoneme level.
  - 9. A system as claimed in claim 6, wherein the speech alignment module comprising of:
    - a speech syllable annotator adapted to annotate and index plurality of syllables in the second speech data;
      
      a text syllable annotator adapted to annotate and index the syllables in the second text transcription data;
      
      a syllable based aligner adapted to align the syllable indexed second speech data to the syllable indexed second text data by matching syllable indexes, to form a first syllable aligned long speech corpus, a long speech Segmenter adapted to segment the first syllable aligned long speech corpus into plurality of uniform segments; and
      
      a short speech aligner adapted to align each short speech segment at atleast sentence, word and phoneme level with the corresponding transcription using an automatic speech recognition engine to form a segmented text aligned second speech corpus;

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Original Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Inventors
Kopparapu, Sunil Kumar, Sheikh, Imran Ahmed

Granted Patent

US 8,756,064 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G06F 16/954 Navigation, e.g. using cate...

G10L 15/06 Creation of reference templ...

FRUGAL METHOD AND SYSTEM FOR CREATING SPEECH CORPUS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

57 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

FRUGAL METHOD AND SYSTEM FOR CREATING SPEECH CORPUS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

57 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links