Document transcription system training
First Claim
1. In a system including a first document containing at least some information in common with a spoken audio stream, a method comprising steps of:
- (A) identifying text in the first document representing a concept having a plurality of spoken forms;
(B) replacing the identified text with a context-free grammar specifying the plurality of spoken forms of the concept to produce a second document; and
(C) generating a first language model based on the second document.
12 Assignments
0 Petitions
Accused Products
Abstract
A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
-
Citations
83 Claims
-
1. In a system including a first document containing at least some information in common with a spoken audio stream, a method comprising steps of:
-
(A) identifying text in the first document representing a concept having a plurality of spoken forms;
(B) replacing the identified text with a context-free grammar specifying the plurality of spoken forms of the concept to produce a second document; and
(C) generating a first language model based on the second document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 27)
-
-
19. A system comprising:
-
a first document containing at least some information in common with a spoken audio stream;
means for identifying text in the first document representing a concept having a plurality of spoken forms;
means for replacing the identified text with a context-free grammar specifying the plurality of spoken forms of the concept to produce a second document; and
means for generating a first language model based on the second document. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
-
28. A method comprising steps of:
-
(A) generating a first language model based on a first document including text and a context-free grammar specifying a plurality of spoken forms of a concept; and
(B) using the first language model in a speech recognition process to recognize a spoken audio stream and thereby to produce a transcript of the spoken audio stream. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 40)
-
-
36. A device comprising:
-
means for generating a first language model based on a first document including text and a context-free grammar specifying a plurality of spoken forms of a concept; and
means for using the first language model in a speech recognition process to recognize a spoken audio stream and thereby to produce a transcript of the spoken audio stream. - View Dependent Claims (37, 38, 39)
-
-
41. A method comprising steps of:
-
(A) applying a speech recognition process to recognize a spoken audio stream and thereby to produce a first document using a first language model based on a second document including text and a context-free grammar specifying a plurality of spoken forms of a concept; and
(B) using the first document and the audio stream to train an acoustic model. - View Dependent Claims (42, 43, 44, 45, 46, 47)
-
-
48. A device comprising:
-
means for applying a speech recognition process to recognize a spoken audio stream and thereby to produce a first document using a first language model based on a second document including text and a context-free grammar specifying a plurality of spoken forms of a concept; and
means for using the first document and the audio stream to train an acoustic model. - View Dependent Claims (49, 50, 51, 52)
-
-
53. A method comprising steps of:
-
(A) identifying a first document containing at least some information in common with a spoken audio stream;
(B) identifying text in the first document representing a concept;
(C) replacing the identified text with a context-free grammar specifying a plurality of spoken forms of the concept to produce a second document; and
(D) replacing text in the second document with normalized text to produce a normalized document including samples of text belonging to spoken forms in the finite state grammar. - View Dependent Claims (54, 55, 56)
-
-
57. A device comprising:
-
means for identifying a first document containing at least some information in common with a spoken audio stream;
means for identifying text in the first document representing a concept;
means for replacing the identified text with a context-free grammar specifying a plurality of spoken forms of the concept to produce a second document; and
means for replacing text in the second document with normalized text to produce a normalized document including samples of text belonging to spoken forms in the finite state grammar. - View Dependent Claims (58, 59)
-
-
60. A method comprising steps of:
-
(A) identifying a normalized document of a spoken audio stream, the normalized document including a context-free grammar specifying a plurality of spoken forms of a concept;
(B) generating a language model based on the normalized document;
(C) using the language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a second document; and
(D) filtering text from the second document by reference to the normalized document to produce a filtered document. - View Dependent Claims (61, 62, 63, 64, 65, 66, 67)
-
-
68. A device comprising:
-
means for identifying a normalized document of a spoken audio stream, the normalized document including a context-free grammar specifying a plurality of spoken forms of a concept;
means for generating a language model based on the normalized document;
means for using the language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a second document; and
means for filtering text from the second document by reference to the normalized document to produce a filtered document. - View Dependent Claims (69, 70, 71)
-
-
72. A method comprising steps of:
-
(A) identifying a normalized document of a spoken audio stream, the normalized document including a context-free grammar specifying a plurality of spoken forms of a concept;
(B) using the language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a second document; and
(C) using a robust parser to filter text from the second document by reference to the normalized document to produce a filtered document. - View Dependent Claims (73, 74, 75, 76, 77, 78, 79)
-
-
80. A device comprising:
-
means for identifying a normalized document of a spoken audio stream, the normalized document including a context-free grammar specifying a plurality of spoken forms of a concept;
means for using the language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a second document; and
means for using a robust parser to filter text from the second document by reference to the normalized document to produce a filtered document. - View Dependent Claims (81, 82, 83)
-
Specification