Human Transcriptionist Directed Posterior Audio Source Separation
First Claim
1. A graphical user interface for human guided audio source separation in a multi-speaker automated transcription system receiving a plurality of different audio signals representing a plurality of different speakers participating together in a speech session, the system comprising:
- a speaker avatar for each speaker distributed about an user interface display to suggest speaker positions relative to each other during the speech session;
a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker;
a speech signal processor for performing signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar; and
a session transcription processor for performing automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element.
2 Assignments
0 Petitions
Accused Products
Abstract
A graphical user interface is described for human guided audio source separation in a multi-speaker automated transcription system receiving audio signals representing speakers participating together in a speech session. A speaker avatar for each speaker is distributed about a user interface display to suggest speaker positions relative to each other during the speech session. There also is a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker. A speech signal processor performs signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar. A session transcription processor performs automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element.
-
Citations
20 Claims
-
1. A graphical user interface for human guided audio source separation in a multi-speaker automated transcription system receiving a plurality of different audio signals representing a plurality of different speakers participating together in a speech session, the system comprising:
-
a speaker avatar for each speaker distributed about an user interface display to suggest speaker positions relative to each other during the speech session; a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker; a speech signal processor for performing signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar; and a session transcription processor for performing automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A graphical user interface for human guided audio source separation in a multi-speaker automated transcription system receiving a plurality of different audio signals representing a plurality of different speakers participating together in a speech session, the system comprising:
-
a speaker avatar for each speaker distributed about an user interface display to suggest speaker positions relative to each other during the speech session; a speech signal processor for performing signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar; an audio playback module for providing to the human transcriptionist an audio playback of the signal processed audio signal including spatial direction information to suggest to the human transcriptionist relative positions of the speech session speakers; and a session transcription processor for performing automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist. - View Dependent Claims (17, 18, 19, 20)
-
Specification