PROCESSING OF AUDIO DATA

US 20160133251A1
Filed: 05/31/2013
Published: 05/12/2016
Est. Priority Date: 05/31/2013
Status: Abandoned Application

First Claim

Patent Images

1. A method for processing audio data, comprising:

generating a transcript language model based on text data representative of a transcript associated with said audio data;

processing said audio data with a transcription engine to determine at least a set of confidence values for a plurality of language elements in a text output of the transcription engine, the transcription engine using said transcript language model; and

determining whether the text data is associated with said audio data based on said set of confidence values.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Examples of processing audio data are described. In certain examples, a transcript language model is based on text data representative of a transcript associated with the audio data. The audio data is processed to determine at least a set of confidence values for language elements in a text output of the processing, wherein the processing uses the transcript language model. The set of confidence values enable a determination to be made. The determination relates to whether the text data is associated with said audio data based on said set of confidence values.

Citations

15 Claims

1. A method for processing audio data, comprising:
- generating a transcript language model based on text data representative of a transcript associated with said audio data;
  
  processing said audio data with a transcription engine to determine at least a set of confidence values for a plurality of language elements in a text output of the transcription engine, the transcription engine using said transcript language model; and
  
  determining whether the text data is associated with said audio data based on said set of confidence values.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein said audio data comprises a plurality of audio tracks for a media item, each audio track having an associated language and the method further comprises:
    - accessing a plurality of transcripts, each transcript being associated with a particular language;
      
      wherein the step of generating a transcript language model comprises generating a transcript language model for each transcript in the plurality of transcripts;
      
      wherein the step of processing said audio data comprises processing at least one audio track with the transcription engine to determine confidence values associated with use of each transcription language model; and
      
      wherein the step of determining whether the text data is associated with at least a portion of said audio data comprises determining a match between at least one audio track and at least one transcript based on the determined confidence values.
  - 3. The method of claim 1, wherein the step of processing said audio data comprises producing a text output with associated timing information and the method further comprises:
    - responsive to a determination that the text data is associated with at least a portion of said audio data, reconciling the text output with the text data representative of said transcript so as to append the timing information to the transcript.
  - 4. The method of claim 1, wherein processing said audio data comprises determining a matrix of confidence values.
  - 5. The method of claim 1, wherein the transcript language model is a statistical N-gram model than is configured using said text data representative of said transcript.
  - 6. The method of claim 1, wherein the transcription engine uses an acoustic model representative of phonemic sound patterns in a spoken language.
  - 7. The method of claim 6, wherein the transcription language model embodies statistical data on at least occurrences of words within the spoken language and wherein the transcription engine uses a pronunciation dictionary to words to phonemic sound patterns.
  - 8. The method of claim 1, further comprising, prior to generating a transcript language model:
    - normalizing the text data representative of said transcript.
  - 9. The method of claim 1, wherein said audio data forms part of a media broadcast and the transcript comprises closed-caption data for said media broadcast.

10. A system processing media data, the media data comprising at least an audio portion, the system comprising:
- a first component to instruct configuration of a language model based on text data representative of audible language elements within said audio portion; and
  
  a second component to instruct conversion of the audio portion of the media data to a text equivalent based on said language model, said conversion outputting a set of confidence values for a plurality of language elements in the text equivalent,wherein the system determines whether the text data is associated with said audio data based on said set of confidence values.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10, further comprising:
    - a third component to compare the text equivalent with the received text data so as to add said timing information to the received text data; and
      
      a fourth component to determine whether the text data is associated with at least a portion of said audio data based on said set of confidence values,wherein the third component is arranged to perform a comparison responsive to a positive determination from the fourth component.
  - 12. The system of claim 10, comprising:
    - a speech-to-text engine communicatively coupled to the second component to convert the audio portion of the media data to the text equivalent, the speech-to-text engine making use of the language model and a sound model, the sound model being representative of sound patterns in a spoken language and the language model being representative of word patterns in a written language.
  - 13. The system of claim 10, further comprising:
    - an interface to receive at least text data associated with the media data, wherein the interface is arranged to convert said received text data to a canonical form.
  - 14. The system of claim 10, wherein:
    - the media data comprises a plurality of audio portions, each audio portion being associated with a respective language;
      
      the text data comprises a plurality of text portions, each text portion being associated with a respective language;
      
      the first component instructs configuration of a plurality of language models, each language model being based on a respective text portion;
      
      the second component instructs conversion of at least one audio portion of the media data to a plurality of text equivalents, the conversion of a particular audio portion being repeated for each of the plurality of language models; and
      
      the system further comprises;
      
      a fourth component to receive probability variables for language elements within each text equivalent and to determine a language from the set of languages for a particular audio portion based on said probability variables.

15. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to:
- generate a transcript language model based on text data representative of a transcript associated with said audio data;
  
  process said audio data with a transcription engine to determine at least a set of confidence values for a plurality of language elements in a text output of the transcription engine, the transcription engine using said transcript language model; and
  
  determine whether the text data is associated with said audio data based on said set of confidence values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Longsand Limited (Open Text Corporation)
Original Assignee
Longsand Limited (Open Text Corporation)
Inventors
Pye, David, Roscher, Travis Barton, Kadirkamanathan, Maha

Application Number

US14/890,538
Publication Number

US 20160133251A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/685   using automatically derived...

G06F 40/151   Transformation

G06F 40/216   using statistical methods

G10L 15/065   Adaptation

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0633   using lexical or orthograph...

G10L 2015/226   using non-speech characteri...

PROCESSING OF AUDIO DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

PROCESSING OF AUDIO DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links