Methods and Apparatus to Generate a Speech Recognition Library

US 20090287486A1
Filed: 05/14/2008
Published: 11/19/2009
Est. Priority Date: 05/14/2008
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments;

computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments;

selecting a set of the plurality of audio data segments based on the plurality of difference metrics;

identifying a first one of the audio data segments in the set as a representative audio data segment;

determining a first phonetic transcription of the representative audio data segment; and

adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus to generate a speech recognition library for use by a speech recognition system are disclosed. An example method comprises identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments, computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments, selecting a set of the plurality of audio data segments based on the plurality of difference metrics, identifying a first one of the audio data segments in the set as a representative audio data segment, determining a first phonetic transcription of the representative audio data segment, and adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.

101 Citations

View as Search Results

20 Claims

1. A method comprising:
- identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments;
  
  computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments;
  
  selecting a set of the plurality of audio data segments based on the plurality of difference metrics;
  
  identifying a first one of the audio data segments in the set as a representative audio data segment;
  
  determining a first phonetic transcription of the representative audio data segment; and
  
  adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method as defined in claim 1, wherein the phrase comprises a single word.
  - 3. A method as defined in claim 1, wherein the phrase comprises at least one of a proper name, a title or a location.
  - 4. A method as defined in claim 1, further comprising associating the first phonetic transcription with the phrase in the speech recognition library.
  - 5. A method as defined in claim 1, further comprising adding the representative audio data segment to the speech recognition library when the first phonetic transcription is added to the speech recognition library.
  - 6. A method as defined in claim 5, wherein identifying the representative audio data segment comprises determining which of the audio data segments in the set has the smallest difference metric.
  - 7. A method as defined in claim 1, further comprising:
    - identifying a second plurality of video segments having closed caption data corresponding to the phrase, the second plurality of video segments associated with respective ones of a second plurality of audio data segments;
      
      computing a second plurality of difference metrics between respective ones of the second plurality of audio data segments and the representative audio data segment;
      
      computing a third plurality of difference metrics between the baseline audio data and respective ones of the second plurality of audio data segments;
      
      identifying a subset of the second plurality of audio data segments based on the second and third plurality of difference metrics;
      
      identifying a first one of the audio data segments in the subset as a second representative audio data segment;
      
      determining a third phonetic transcription of the second representative audio data segment; and
      
      adding the third phonetic transcription to the speech recognition library when the third phonetic transcription differs from the first and second phonetic transcriptions.

8. An apparatus comprising:
- an audio segment selector to identify a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments;
  
  an audio comparator to compute a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments;
  
  an audio segment grouper to identify a set of the plurality of audio data segments based on the plurality of difference metrics;
  
  a phonetic transcriber to determine a first phonetic transcription corresponding to the set of audio data segments; and
  
  a database manager to add the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.
- View Dependent Claims (9, 10)
- - 9. An apparatus as defined in claim 8, wherein the phrase comprises a single word.
  - 10. An apparatus as defined in claim 8, wherein the speech recognition library comprises:
    - a first field representing the phrase;
      
      a second field associated with the first field representing the baseline audio data segment;
      
      a third field associated with the first field representing the second phonetic transcription; and
      
      a fourth field associated with the first field representing the first phonetic transcription when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.

11. An article of manufacture storing machine readable instructions which, when executed, cause a machine to:
- identify a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments;
  
  compute a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments;
  
  select a set of the plurality of audio data segments based on the plurality of difference metrics;
  
  identify a first one of the audio data segments in the set as a representative audio data segment;
  
  determine a first phonetic transcription of the representative audio data segment; and
  
  add the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.
- View Dependent Claims (12, 13, 14)
- - 12. An article of manufacture as defined in claim 11, wherein the machine readable instructions, when executed, cause the machine to associate the first phonetic transcription with the phrase in the speech recognition library.
  - 13. An article of manufacture as defined in claim 11, wherein the machine readable instructions, when executed, cause the machine to add the representative audio data segment to the speech recognition library when the first phonetic transcription is added to the speech recognition library.
  - 14. An article of manufacture as defined in claim 11, wherein the machine readable instructions, when executed, cause the machine to:
    - identify a second plurality of video segments having closed caption data corresponding to the phrase, the second plurality of video segments associated with respective ones of a second plurality of audio data segments;
      
      compute a second plurality of difference metrics between respective ones of the second plurality of audio data segments and the representative audio data segment;
      
      compute a third plurality of difference metrics between the baseline audio data and respective ones of the second plurality of audio data segments;
      
      identify a subset of the second plurality of audio data segments based on the second and third plurality of difference metrics;
      
      identify a first one of the audio data segments in the subset as a second representative audio data segment;
      
      determine a third phonetic transcription of the second representative audio data segment; and
      
      add the third phonetic transcription to the speech recognition library when the third phonetic transcription differs from the first and second phonetic transcriptions.

15. A method comprising:
- identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments;
  
  determining a plurality of phonetic transcriptions for respective ones of the plurality of audio data segments;
  
  identifying a set of the plurality of audio data segments having a first phonetic transcription different from a second phonetic transcription associated with the phrase in a speech recognition library; and
  
  adding the first phonetic transcription to the speech recognition library.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. A method as defined in claim 15, wherein the phrase comprises a single word.
  - 17. A method as defined in claim 15, wherein the phrase comprises at least one of a proper name, a title or a location.
  - 18. A method as defined in claim 15, further comprising associating the first phonetic transcription with the phrase in the speech recognition library.
  - 19. A method as defined in claim 15, further comprising adding a first of the set of the plurality of audio data segments to the speech recognition library.
  - 20. A method as defined in claim 19, wherein identifying the first of the set audio data segments comprises determining which of the audio data segments in the set has the smallest difference metric.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Chang, Hisao M.

Granted Patent

US 9,202,460 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G06F 40/10   Text processing natural lan...

G10L 13/02   Methods for producing synth...

G10L 13/04   Details of speech synthesis...

G10L 13/06   Elementary speech units use...

G10L 13/08   Text analysis or generation...

G10L 15/06   Creation of reference templ...

G10L 15/063   Training

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 25/57   for processing of video sig...

Methods and Apparatus to Generate a Speech Recognition Library

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

101 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and Apparatus to Generate a Speech Recognition Library

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

101 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links