Human Transcriptionist Directed Posterior Audio Source Separation

US 20140163982A1
Filed: 12/12/2012
Published: 06/12/2014
Est. Priority Date: 12/12/2012
Status: Active Grant

First Claim

Patent Images

1. A graphical user interface for human guided audio source separation in a multi-speaker automated transcription system receiving a plurality of different audio signals representing a plurality of different speakers participating together in a speech session, the system comprising:

a speaker avatar for each speaker distributed about an user interface display to suggest speaker positions relative to each other during the speech session;

a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker;

a speech signal processor for performing signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar; and

a session transcription processor for performing automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A graphical user interface is described for human guided audio source separation in a multi-speaker automated transcription system receiving audio signals representing speakers participating together in a speech session. A speaker avatar for each speaker is distributed about a user interface display to suggest speaker positions relative to each other during the speech session. There also is a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker. A speech signal processor performs signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar. A session transcription processor performs automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element.

Citations

20 Claims

1. A graphical user interface for human guided audio source separation in a multi-speaker automated transcription system receiving a plurality of different audio signals representing a plurality of different speakers participating together in a speech session, the system comprising:
- a speaker avatar for each speaker distributed about an user interface display to suggest speaker positions relative to each other during the speech session;
  
  a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker;
  
  a speech signal processor for performing signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar; and
  
  a session transcription processor for performing automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The interface according to claim 1, wherein the speaker highlight element is controllable by the human transcriptionist to visually highlight a specific speaker avatar.
  - 3. The interface according to claim 1, wherein the speaker highlight element is lockable by the human transcriptionist to fix it in place.
  - 4. The interface according to claim 3, further comprising:
    - a highlight lock element on the user interface display indicating to the human transcriptionist when the speaker highlight element is locked.
  - 5. The interface according to claim 1, wherein the speaker highlight element is automatically controllable by the session transcription processor to indicate to the human transcriptionist the speaker avatar for the speaker being treated as the active speaker.
  - 6. The interface according to claim 1, further comprising:
    - an audio energy heat map display element on the user interface display indicating to the human transcriptionist current sources of audio energy.
  - 7. The interface according to claim 1, further comprising:
    - a time-based display of audio energy in a portion of audio space corresponding to the speaker highlight element.
  - 8. The interface according to claim 1, wherein the speaker highlight element comprises a beam-shaped highlight element.
  - 9. The interface according to claim 1, wherein the audio signals are from one or more simultaneous recordings of an earlier speech session.
  - 10. The interface according to claim 9, wherein the speaker highlight element is controllable by the human transcriptionist to select different portions of audio space for examination during different replays of one or more recordings.
  - 11. The interface according to claim 1, wherein the audio signals are from a real time speech session in progress.
  - 12. The interface according to claim 1, wherein the session transcription processor includes an audio playback module for providing to the human transcriptionist an audio playback of the signal processed audio signal including spatial direction information to suggest to the human transcriptionist relative positions of the speech session speakers.
  - 13. The interface according to claim 12, wherein the audio playback is a stereo audio playback.
  - 14. The interface according to claim 1, wherein the speaker avatars are stationary and the speaker highlight element rotates on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker.
  - 15. The interface according to claim 1, wherein the speaker highlight element is stationary and the speaker avatars rotate on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker

16. A graphical user interface for human guided audio source separation in a multi-speaker automated transcription system receiving a plurality of different audio signals representing a plurality of different speakers participating together in a speech session, the system comprising:
- a speaker avatar for each speaker distributed about an user interface display to suggest speaker positions relative to each other during the speech session;
  
  a speech signal processor for performing signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar;
  
  an audio playback module for providing to the human transcriptionist an audio playback of the signal processed audio signal including spatial direction information to suggest to the human transcriptionist relative positions of the speech session speakers; and
  
  a session transcription processor for performing automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The interface according to claim 16, wherein the audio playback is a stereo audio playback.
  - 18. The interface according to claim 16, wherein the audio signals are from one or more simultaneous recordings of an earlier speech session.
  - 19. The interface according to claim 18, wherein the user interface is controllable by the human transcriptionist to select different portions of audio space for examination during different replays of one or more recordings.
  - 20. The interface according to claim 16, wherein the audio signals are from a real time speech session in progress.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Daborn, Andrew Johnathon, Jost, Uwe Helmut

Granted Patent

US 9,679,564 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

G10L 17/22   Interactive procedures; Man...

G10L 21/0272   Voice signal separating

Human Transcriptionist Directed Posterior Audio Source Separation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Human Transcriptionist Directed Posterior Audio Source Separation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links