×

Automatic indexing and aligning of audio and text using speech recognition

  • US 5,649,060 A
  • Filed: 10/23/1995
  • Issued: 07/15/1997
  • Est. Priority Date: 10/18/1993
  • Status: Expired due to Term
First Claim
Patent Images

1. An apparatus for indexing an audio recording comprising:

  • an acoustic recorder for storing an ordered series of acoustic information signal units representing sounds generated from spoken words, said acoustic recorder having a plurality of recording locations, each recording location storing at least one acoustic information signal unit;

    a timer connected to said acoustic recorder for time stamping said acoustic information signal units;

    a speech recognizer connected to said acoustic recorder for generating an ordered series of recognized words having a high conditional probability of occurrence given the occurrence of the sounds represented by the acoustic information signal units from said acoustic recorder, each recognized word corresponding to at least one acoustic information signal unit and comprising a series of one or more characters, each recognized word having a context of at least one preceding or following recognized word;

    a time alignment device connected to said speech recognizer and receiving time stamps of said acoustic information signal units for aligning said acoustic information signal units according to respective time stamps of said acoustic information signal units;

    a text storage device for storing a transcript of text of the spoken words corresponding to ordered series of acoustic information signal units stored on said acoustic recorder;

    mapping means connected to said text storage device for determining a size of an acoustic information signal unit to be passed to said speech recognizer from said acoustic recorder, said mapping means generating an ordered series of index words, said ordered series of index words comprising a representation of at least some of the spoken words represented by the acoustic information signal units, each index word having a context of at least one preceding or following index word and comprising a series of one or more characters;

    a segmenter controlled by said mapping means for controlling playback of acoustic information signal units to said speech recognizer; and

    alignment means connected to said acoustic recorder and to said mapping means for comparing the ordered series of recognized words with the ordered series of index words to pair recognized words and index words which are the same word and which have matching contexts, a recognized word being the same as an index word when both words comprise the same series of characters, a context of a target recognized word comprises the number of other recognized words preceding and following the target recognized word in the ordered series of recognized words, a context of a larger index word comprises the number of other index words preceding and following the target index word in the ordered series of index words, and the context of a recognized word matches the context of an index word if the context of the target recognized word is within a selected threshold value of the context of the target index word, said alignment means tagging each paired index word with the recording location of the acoustic information signal unit corresponding to the recognized word paired with the index word.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×