Document transcription system training

US 8,335,688 B2
Filed: 08/20/2004
Issued: 12/18/2012
Est. Priority Date: 08/20/2004
Status: Active Grant

First Claim

Patent Images

1. In a system including a first document, the document tangibly stored in a computer-readable medium and containing at least some information in common with a spoken audio stream, a method performed by a computer processor executing instructions tangibly stored in a first computer-readable medium, the method comprising steps of:

(A) identifying text tangibly stored in the first document on a second computer-readable medium, wherein the text represents a concept;

(B) identifying, based on the identified text, a plurality of at least three spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other, wherein (B) comprises;

(B) (1) identifying a name of the identified text; and

(B) (2) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies the plurality of spoken forms of the concept;

(C) replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a third computer-readable medium; and

(D) generating a first language model, tangibly stored in a fourth computer-readable medium, based on the second document.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.

Citations

38 Claims

1. In a system including a first document, the document tangibly stored in a computer-readable medium and containing at least some information in common with a spoken audio stream, a method performed by a computer processor executing instructions tangibly stored in a first computer-readable medium, the method comprising steps of:
- (A) identifying text tangibly stored in the first document on a second computer-readable medium, wherein the text represents a concept;
  
  (B) identifying, based on the identified text, a plurality of at least three spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other, wherein (B) comprises;
  
  (B) (1) identifying a name of the identified text; and
  
  (B) (2) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies the plurality of spoken forms of the concept;
  
  (C) replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a third computer-readable medium; and
  
  (D) generating a first language model, tangibly stored in a fourth computer-readable medium, based on the second document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the concept comprises a semantic concept.
  - 3. The method of claim 2, wherein the concept comprises a date.
  - 4. The method of claim 1, wherein the concept comprises a syntactic concept.
  - 5. The method of claim 4, wherein the concept comprises a sentence.
  - 6. The method of claim 4, wherein the concept comprises the entire second document.
  - 7. The method of claim 1, further comprising a step of:
    - (E) using the first language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a third document tangibly stored in a fifth computer-readable medium.
  - 8. The method of claim 7, further comprising a step of:
    - (F) using the third document and the spoken audio stream to train an acoustic model tangibly stored in a sixth computer-readable medium.
  - 9. The method of claim 8, wherein the step (F) comprises steps of:
    - (F)(1) filtering text from the third document by reference to the second document to produce a filtered document tangibly stored in a seventh computer-readable medium; and
      
      (F)(2) using the filtered document and the spoken audio stream to train the acoustic model.
  - 10. The method of claim 9, wherein the step (F)(1) comprises applying a robust parser to the second and third documents to produce the filtered document.
  - 11. The method of claim 7, wherein the step (E) comprises steps of:
    - (E) (1) interpolating the first language model with a second language model to produce a third language model tangibly stored in a sixth computer-readable medium; and
      
      (E) (2) using the third language model in the speech recognition process to recognize the spoken audio stream and thereby to produce the third document.
  - 12. The method of claim 1, wherein the first document comprises a document generated based on the spoken audio stream.
  - 13. The method of claim 1, further comprising a step of:
    - (E) prior to step (D), normalizing the second document to produce a normalized document tangibly stored in a fifth computer-readable medium.
  - 14. The method of claim 1, further comprising a step of:
    - (E) prior to step (D), repeating steps (A), (B), and (C) for each of a plurality of texts in the first document.
  - 15. The method of claim 1, wherein the step (C) comprises steps of:
    - (C)(1) generating probabilities for the plurality of spoken forms specified by the context-free grammar; and
      
      (C)(2) including the probabilities in the context-free grammar.
  - 16. The method of claim 15, wherein the step (C) further comprises a step of:
    - (C)(3) including the plurality of spoken forms in the context-free grammar.
  - 17. The method of claim 1, wherein the context-free grammar comprises a finite state grammar.
  - 18. The method of claim 1, wherein the first document comprises one of a first plurality of documents tangibly stored in the second computer-readable medium, and wherein the method further comprises a step of:
    - (E) repeating steps (A), (B), and (C) for each of the plurality of documents to produce a second plurality of documents, including the second document, tangibly stored in a fifth computer-readable medium; and
      
      wherein the step (D) comprises a step of generating the first language model based on the second plurality of documents.
  - 19. The method of claim 1, wherein all of the plurality of spoken forms have the same semantic meaning as each other.

20. A system comprising:
- a first computer-readable medium tangibly storing a first document containing at least some information in common with a spoken audio stream;
  
  a second computer-readable medium tangibly storing computer program instructions for identifying text in the first document representing a concept;
  
  a third computer-readable medium tangibly storing computer program instructions for identifying, based on the identified text, a plurality of at least three spoken forms of the concept, including at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other, wherein identifying the plurality of spoken forms of the concept comprises;
  
  identifying a name of the identified text; and
  
  using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies the plurality of spoken forms of the concept;
  
  a fourth computer-readable medium tangibly storing computer program instructions for replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored on a fifth computer-readable medium; and
  
  a sixth computer-readable medium tangibly storing computer program instructions for generating a first language model based on the second document.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 21. The system of claim 20, wherein the concept comprises a syntactic concept.
  - 22. The system of claim 20, further comprising:
    - a seventh computer-readable medium tangibly storing computer program instructions for using the first language model in a speech recognition process to recognize the spoken audio stream and thereby to produce a third document tangibly stored in an eighth computer-readable medium.
  - 23. The system of claim 22, further comprising:
    - a ninth computer-readable medium tangibly storing computer program instructions for using the third document and the spoken audio stream to train an acoustic model tangibly stored in a tenth computer-readable medium.
  - 24. The system of claim 22, wherein the means for using the first language model comprises:
    - means for interpolating the first language model with a second language model to produce a third language model tangibly stored in a ninth computer-readable medium; and
      
      means for using the third language model in the speech recognition process to recognize the spoken audio stream and thereby to produce the third document.
  - 25. The system of claim 20, wherein the first document comprises a document generated based on the spoken audio stream.
  - 26. The system of claim 20, further comprising:
    - means for normalizing the second document to produce a normalized document tangibly stored in a seventh computer-readable medium.
  - 27. The system of claim 20, wherein the first document comprises one of a first plurality of documents tangibly stored in the first computer-readable medium, and wherein the system further comprises:
    - a seventh computer-readable medium comprising tangibly storing computer program instructions for repeatedly activating the instructions for identifying text, the instructions for identifying the plurality of spoken forms, and the instructions for replacing the identified text for each of the plurality of documents to produce a second plurality of documents, tangibly stored in the fifth computer-readable medium, including the second document; and
      
      wherein the means for generating the first language model comprises means for generating the first language model based on the second plurality of documents.
  - 28. The system of claim 20, wherein the concept comprises a semantic concept.
  - 29. The system of claim 20, wherein all of the plurality of spoken forms have the same semantic meaning as each other.

30. A method performed by a computer processor executing instructions tangibly stored in a first computer-readable medium, the method comprising steps of:
- (A) identifying a first document, tangibly stored in a second computer-readable medium, containing at least some information in common with a spoken audio stream;
  
  (B)(B)(1) identifying text in the first document representing a concept;
  
  (B)(2) identifying a name of the identified text;
  
  (B)(3) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies a plurality of at least three spoken forms of the concept, and wherein the corresponding context-free grammar includes at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other;
  
  (C) replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a third computer-readable medium; and
  
  (D) replacing text in the second document with normalized text to produce a normalized document, tangibly stored in a fourth computer-readable medium, the normalized document including samples of text belonging to spoken forms in the finite state grammar.
- View Dependent Claims (31, 32, 33, 34)
- - 31. The method of claim 30, wherein the concept comprises a semantic concept.
  - 32. The method of claim 30, wherein the concept comprises a syntactic concept.
  - 33. The method of claim 30, wherein the context-free grammar comprises a finite state grammar.
  - 34. The method of claim 30, wherein all of the plurality of spoken forms have the same semantic meaning as each other.

35. A device comprising:
- a first computer-readable medium tangibly storing computer program instructions for identifying a first document, tangibly stored in a second computer-readable medium, the first document containing at least some information in common with a spoken audio stream;
  
  a third computer-readable medium tangibly storing computer program instructions for;
  
  (1) identifying text in the first document representing a concept;
  
  (2) identifying a name of the identified text; and
  
  (3) using the identified name to identify a corresponding context-free grammar in a grammar repository, wherein the corresponding context-free grammar specifies a plurality of at least three spoken forms of the concept, and wherein the corresponding context-free grammar includes at least one spoken form not contained in the first document, wherein all of the plurality of spoken forms have the same content as each other;
  
  a fourth computer-readable medium tangibly storing computer program instructions for replacing the identified text with the corresponding context-free grammar to produce a second document tangibly stored in a fifth computer-readable medium; and
  
  a sixth computer-readable medium tangibly storing computer program instructions for replacing text in the second document with normalized text to produce a normalized document, tangibly stored in a seventh computer-readable medium, the normalized document including samples of text belonging to spoken forms in the finite state grammar.
- View Dependent Claims (36, 37, 38)
- - 36. The device of claim 35, wherein the concept comprises a semantic concept.
  - 37. The device of claim 35, wherein the concept comprises a syntactic concept.
  - 38. The device of claim 35, wherein all of the plurality of spoken forms have the same semantic meaning as each other.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Solventum Intellectual Properties Company (Solventum Corp.)
Original Assignee
Multimodal Technologies Incorporated (3M Company)
Inventors
Yegnanarayanan, Girija, Finke, Michael, Fritsch, Juergen, Koll, Detlef, Woszczyna, Monika
Primary Examiner(s)
Armstrong, Angela A

Application Number

US10/922,513
Publication Number

US 20060041427A1
Time in Patent Office

3,042 Days
Field of Search

704/235, 704/251, 704/257
US Class Current

704/235
CPC Class Codes

G10L 15/063   Training

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/26   Speech to text systems G10L...

Document transcription system training

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Document transcription system training

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links