Combining Re-Speaking, Partial Agent Transcription and ASR for Improved Accuracy / Human Guided ASR

US 20140163981A1
Filed: 12/12/2012
Published: 06/12/2014
Est. Priority Date: 12/12/2012
Status: Active Grant

First Claim

Patent Images

1. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session, the system comprising:

a preliminary transcription module for developing a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance;

a speech selection module for user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing; and

a final transcription module responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech transcription system is described for producing a representative transcription text from one or more different audio signals representing one or more different speakers participating in a speech session. A preliminary transcription module develops a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance. A speech selection module enables user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing. A final transcription module is responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance.

Citations

26 Claims

1. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session, the system comprising:
- a preliminary transcription module for developing a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance;
  
  a speech selection module for user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing; and
  
  a final transcription module responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system according to claim 1, wherein the preliminary transcription module develops the preliminary transcription automatically.
  - 3. The system according to claim 1, wherein the preliminary transcription module develops the preliminary transcription with human assistance.
  - 4. The system according to claim 1, wherein the speech selection module makes the user selection based on one more specified selection rules.
  - 5. The system according to claim 1, wherein the speech selection module makes the user selection by manual user selection.
  - 6. The system according to claim 1, wherein the final transcription module develops the final transcription output using manual transcription.
  - 7. The system according to claim 1, wherein the final transcription module develops the final transcription output using automatic speech recognition.
  - 8. The system according to claim 7, wherein the final transcription module uses transcription models adapted from information developed by the preliminary transcription module.
  - 9. The system according to claim 7, wherein the final transcription module uses transcription modules adapted from one or more different speakers.
  - 10. The system according to claim 1, wherein there are a plurality of different audio signals representing a plurality of different speakers participating in the speech session.

11. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session, the system comprising:
- a keyword transcript module for real time processing of one or more speech signals by a human agent to generate a partial transcript of keywords;
  
  a transcript alignment module for time aligning the partial transcript with the one or more speech signals;
  
  a speech transcription module for performing automatic speech recognition of the one or more speech signals as constrained by the time aligned partial transcript to produce a final transcription output for the speech session containing the keywords.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system according to claim 11, further comprising:
    - an ASR support module for ASR processing of the one or more speech signals in real time to support the human agent while generating the partial transcript.
  - 13. The system according to claim 12, wherein the ASR support module includes one or more language models for predicting likely words in the one or more speech signals for the human agent to consider while generating the partial transcript.
  - 14. The system according to claim 12, wherein the ASR support module provides a low latency initial ASR output for the human agent to consider while generating the partial transcript.
  - 15. The system according to claim 11, wherein the speech transcription module is adapted to allow reordering and realigning of keywords in portions of the partial transcription associated with low recognition confidence.
  - 16. The system according to claim 11, wherein the speech transcription module includes one or more ASR language models that are updated based on the partial transcript prior to performing the automatic speech recognition of the one or more speech signals.
  - 17. The system according to claim 11, further comprising:
    - an indexing module for associating selected keywords with the recognition output for post-recognition information retrieval operations.
  - 18. The system of claim 11, further comprising:
    - a summary module for natural language processing of the partial transcript to develop a narrative summary characterizing the recognition output.
  - 19. The system according to claim 11, wherein there are a plurality of different audio signals representing a plurality of different speakers.

20. A speech transcription system for producing a representative transcription text from one or more audio signals representing one or more speakers participating in a speech session, the system comprising:
- a session monitoring module for user monitoring of the one or more audio signals;
  
  a user re-speak module for receiving a user re-speaking of at least a portion of the speech session;
  
  a session ASR module for generating a session recognition result corresponding to the one or more audio signals for the speech session;
  
  a re-speak ASR module for generating a re-speak recognition result corresponding to the user re-speaking;
  
  a session transcription module for combining the session recognition result and the re-speak recognition result to develop a final transcription output for the speech session.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The system according to claim 20, wherein the session ASR module uses speaker independent speech recognition for generating the session recognition result.
  - 22. The system according to claim 20, wherein the re-speak ASR module uses speaker dependent speech recognition for generating the re-speak recognition result.
  - 23. The system according to claim 20, further comprising:
    - a user review module for user review and correction of the final transcription output.
  - 24. The system according to claim 23, wherein the user review module automatically highlights lesser reliability portions of the final transcription output for user review and correction.
  - 25. The system according to claim 20, wherein the re-speak ASR receives human assistance in generating the re-speak recognition result.
  - 26. The system according to claim 20, wherein there are a plurality of different audio signals representing a plurality of different speakers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ganong, William F. III, Cook, Gary David, Daborn, Andrew Johnathon

Granted Patent

US 9,117,450 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 15/32   Multiple recognisers used i...

G10L 2015/221   Announcement of recognition...

Combining Re-Speaking, Partial Agent Transcription and ASR for Improved Accuracy / Human Guided ASR

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Combining Re-Speaking, Partial Agent Transcription and ASR for Improved Accuracy / Human Guided ASR

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links