×

Discriminative training of document transcription system

  • US 8,412,521 B2
  • Filed: 09/16/2005
  • Issued: 04/02/2013
  • Est. Priority Date: 08/20/2004
  • Status: Active Grant
First Claim
Patent Images

1. In a computer system including a first document tangibly stored in a first computer-readable medium and containing at least some information in common with a spoken audio stream, a method performed by at least one computer processor executing computer program instructions tangibly stored in a second computer-readable medium, the method comprising steps of:

  • (A) identifying first text tangibly stored in the first document on a third computer-readable medium, wherein the first text represents a first instance of a concept;

    (B) identifying, based on the identified first text, a first plurality of at least three spoken forms of the first instance of the concept, including at least one spoken form not contained in the first document;

    (C) replacing the identified first text with a first context-free grammar specifying the first plurality of spoken forms of the first instance of the concept to produce a second document tangibly stored in a fourth computer-readable medium;

    (D) identifying second text tangibly stored in the first document on the third computer-readable medium, wherein the second text represents a second instance of the concept;

    (E) identifying, based on the identified second text, a second plurality of at least three spoken forms of the second instance of the concept, wherein the first plurality of spoken forms differs from the second plurality of spoken forms;

    (F) replacing the identified second text with a second context-free grammar specifying the second plurality of spoken forms of the second instance of the concept within the second document;

    (G) generating a first language model, tangibly stored in a fifth computer-readable medium, based on the second document;

    (H) using the first language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a third document tangibly stored in a sixth computer-readable medium;

    (I) filtering text from the third document by reference to the second document to produce a filtered document, tangibly stored in a seventh computer-readable medium, in which text filtered from the third document is marked as unreliable; and

    (J) using the filtered document and the spoken audio stream to train an acoustic model, tangibly stored in an eighth computer-readable medium, by performing steps of;

    (J)(1) applying a first speech recognition process to the spoken audio stream using a set of base acoustic models and a grammar network based on the filtered document to produce a first set of recognition structures tangibly stored in a ninth computer-readable medium;

    (J)(2) applying a second speech recognition process to the spoken audio stream using the set of base acoustic models and a second language model to produce a second set of recognition structures tangibly stored in a tenth computer-readable medium; and

    (J)(3) performing discriminative training of the acoustic model using the first set of recognition structures, the second set of recognition structures, the filtered document, and only those portions of the spoken audio stream corresponding to text not marked as unreliable in the filtered document.

View all claims
  • 14 Assignments
Timeline View
Assignment View
    ×
    ×