Method and system for creating frugal speech corpus using internet resources and conventional speech corpus

US 8,756,064 B2
Filed: 06/26/2012
Issued: 06/17/2014
Est. Priority Date: 07/28/2011
Status: Active Grant

First Claim

Patent Images

1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:

identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;

extracting a second speech data of at least one accessible encoding format from the first speech data;

extracting a second text transcription data with at least one encoding format from the first text transcription data;

matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;

analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and

conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech corpus creation method and system are disclosed. The method comprising identifying a publicly accessible first source of the first speech data and its corresponding first text transcription; extracting a second speech data of an accessible encoding format from the first speech data; extracting a second text transcription data with at least one encoding format from the first text transcription data; matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus; analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.

12 Citations

View as Search Results

9 Claims

1. A speech corpus creation method, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the method comprising processor implemented steps of:
- identifying at least one publicly accessible first source of the first speech data and its corresponding first text transcription;
  
  extracting a second speech data of at least one accessible encoding format from the first speech data;
  
  extracting a second text transcription data with at least one encoding format from the first text transcription data;
  
  matching and aligning the transcription to the extracted second speech data at a sentence, word, phoneme level, or combination thereof to form a first and a second speech corpus;
  
  analyzing the text transcriptions in the second speech corpus to identify the short speech segments to produce a phonetically balanced, segmented, text aligned third speech corpus; and
  
  conditioning the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from at least one second source to form the final speech corpus.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as claimed in claim 1, wherein speech data extractor engine extracts the first speech data along with transcription thereof from publicly accessible sources that are relevant to a desired corpus.
  - 3. A method as claimed in claim 1, wherein while aligning the speech, a two stage alignment process is carried out comprising a first level syllable matching adapted to match the extracted transcription to the corresponding extracted long speech data at a syllable level and a subsequent second level matching adapted to align, using automatic speech recognition engine, short speech segments of the long syllable aligned speech data at sentence, word or phoneme level.
  - 4. A method as claimed in claim 1, wherein the matching and aligning comprising steps of:
    - detecting plurality of syllables in the second speech data;
      
      detecting plurality of syllables in the second text transcription data by employing a text syllable annotator;
      
      annotating and indexing each detected syllable in the second speech data and in the second text transcription data;
      
      aligning the syllable annotated second speech data with the syllable annotated second text data by matching the corresponding syllable indexes, to form a first syllable aligned speech corpus;
      
      segmenting the said first aligned corpus into plurality of short speech segments of uniform length; and
      
      aligning each short segment with the corresponding exacted text transcription to form a segmented text aligned second speech corpus, featuring alignment at sentence, word or phoneme level.
  - 5. A method as claimed in claim 1, wherein the third speech corpus, derived from a public source of speech data and its transcription, is conditioned with a context and associated environment richer corpus, collected using traditional procedure, to form the final speech corpus.

6. A speech corpus creation system, implementing extraction of a first speech data from at least one first source and mixing with at least one second source, the system comprising:
- a speech data extractor adapted to extract a second speech data of at least one encoding format from the first speech data;
  
  a text data extractor adapted to extract a second text transcription data of at least one encoding format from a first text transcription of the first speech data;
  
  a speech alignment module adapted to match and align the first text transcription to the corresponding extracted long speech data in the first speech data, at a sentence word level, or combination thereof to form a first and a second speech corpus;
  
  a phonetically balanced data extractor for analyzing the text transcriptions in the second speech corpus and to identify the short speech segments to form a phonetically balanced, segmented, text aligned third speech corpus; and
  
  a compensator means adapted to identify at least one contextual gap in the third speech corpus and to condition the third speech corpus by inserting a context and associated environment richer corpus therein the third speech corpus from the at least one second source to form a final speech corpus.
- View Dependent Claims (7, 8, 9)
- - 7. A system as claimed in claim 6, wherein speech data extractor engine extracts the first speech data along with transcription thereof from publicly accessible sources that are relevant to a desired corpus domain.
  - 8. A system as claimed in claim 6, wherein while aligning the speech, a two stage alignment is carried out comprising a first level syllable matching adapted to match the extracted transcription to the corresponding extracted long speech data at a syllable level and a subsequent second level of matching adapted to align, using automatic speech recognition, short speech segments of the syllable aligned long speech data, at least sentence, word and phoneme level.
  - 9. A system as claimed in claim 6, wherein the speech alignment module comprising of:
    - a speech syllable annotator adapted to annotate and index plurality of syllables in the second speech data;
      
      a text syllable annotator adapted to annotate and index the syllables in the second text transcription data;
      
      a syllable based aligner adapted to align the syllable indexed second speech data to the syllable indexed second text data by matching syllable indexes, to form a first syllable aligned long speech corpus, a long speech Segmenter adapted to segment the first syllable aligned long speech corpus into plurality of uniform segments; and
      
      a short speech aligner adapted to align each short speech segment at least sentence, word and phoneme level with the corresponding transcription using an automatic speech recognition engine to form a segmented text aligned second speech corpus.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Original Assignee
TATA Consultancy Services Limited (Tata Sons Pvt Ltd.)
Inventors
Sheikh, Imran Ahmed, Kopparapu, SunilKumar
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US13/533,174
Publication Number

US 20130030810A1
Time in Patent Office

721 Days
Field of Search

704/260
US Class Current

704/260
CPC Class Codes

G06F 16/954 Navigation, e.g. using cate...

G10L 15/06 Creation of reference templ...

Method and system for creating frugal speech corpus using internet resources and conventional speech corpus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

12 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Method and system for creating frugal speech corpus using internet resources and conventional speech corpus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others