Document transcription system training
First Claim
1. In a system including a first document, the document tangibly stored in a computer-readable medium and containing at least some information in common with a spoken audio stream, a method performed by a computer processor executing instructions tangibly stored in a first computer-readable medium, the method comprising steps of:
- (A) identifying text tangibly stored in the first document on a second computer-readable medium, wherein the text represents a concept;
(B) identifying, based on the identified text, a plurality of at least three spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other, wherein (B) comprises;
(B) (1) identifying a name of the identified text; and
(B) (2) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies the plurality of spoken forms of the concept;
(C) replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a third computer-readable medium; and
(D) generating a first language model, tangibly stored in a fourth computer-readable medium, based on the second document.
12 Assignments
0 Petitions
Accused Products
Abstract
A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
-
Citations
38 Claims
-
1. In a system including a first document, the document tangibly stored in a computer-readable medium and containing at least some information in common with a spoken audio stream, a method performed by a computer processor executing instructions tangibly stored in a first computer-readable medium, the method comprising steps of:
-
(A) identifying text tangibly stored in the first document on a second computer-readable medium, wherein the text represents a concept; (B) identifying, based on the identified text, a plurality of at least three spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other, wherein (B) comprises; (B) (1) identifying a name of the identified text; and (B) (2) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies the plurality of spoken forms of the concept; (C) replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a third computer-readable medium; and (D) generating a first language model, tangibly stored in a fourth computer-readable medium, based on the second document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
a first computer-readable medium tangibly storing a first document containing at least some information in common with a spoken audio stream; a second computer-readable medium tangibly storing computer program instructions for identifying text in the first document representing a concept; a third computer-readable medium tangibly storing computer program instructions for identifying, based on the identified text, a plurality of at least three spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other, wherein identifying the plurality of spoken forms of the concept comprises; identifying a name of the identified text; and using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies the plurality of spoken forms of the concept; a fourth computer-readable medium tangibly storing computer program instructions for replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored on a fifth computer-readable medium; and a sixth computer-readable medium tangibly storing computer program instructions for generating a first language model based on the second document. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A method performed by a computer processor executing instructions tangibly stored in a first computer-readable medium, the method comprising steps of:
-
(A) identifying a first document, tangibly stored in a second computer-readable medium, containing at least some information in common with a spoken audio stream; (B) (B)(1) identifying text in the first document representing a concept; (B)(2) identifying a name of the identified text; (B)(3) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies a plurality of at least three spoken forms of the concept, and wherein the corresponding context-free grammar includes at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other; (C) replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a third computer-readable medium; and (D) replacing text in the second document with normalized text to produce a normalized document, tangibly stored in a fourth computer-readable medium, the normalized document including samples of text belonging to spoken forms in the finite state grammar. - View Dependent Claims (31, 32, 33, 34)
-
-
35. A device comprising:
-
a first computer-readable medium tangibly storing computer program instructions for identifying a first document, tangibly stored in a second computer-readable medium, the first document containing at least some information in common with a spoken audio stream; a third computer-readable medium tangibly storing computer program instructions for;
(1) identifying text in the first document representing a concept;
(2) identifying a name of the identified text; and
(3) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies a plurality of at least three spoken forms of the concept, and wherein the corresponding context-free grammar includes at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other;a fourth computer-readable medium tangibly storing computer program instructions for replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a fifth computer-readable medium; and a sixth computer-readable medium tangibly storing computer program instructions for replacing text in the second document with normalized text to produce a normalized document, tangibly stored in a seventh computer-readable medium, the normalized document including samples of text belonging to spoken forms in the finite state grammar. - View Dependent Claims (36, 37, 38)
-
Specification