Methods for presenting speech blocks from a plurality of audio input data streams to a user in an interface
First Claim
Patent Images
1. A method comprising:
- a) using a computer, identifying qualified audio on each of a plurality of audio input data streams by;
i) identifying any unique signals on any of the plurality of audio input data streams which exceed an amplitude threshold as qualified audio; and
ii) when similar signals exceeding the amplitude threshold are detected on multiple audio input data streams, identifying only the loudest of the similar signals as qualified audio;
b) for each of the audio input data streams, identifying a set of speech blocks, each of which has a status and a start time, by, for each frame in the audio input data stream;
i) executing code configured to add the current frame to a most recently created speech block if and only if;
A) the most recent preceding frame corresponding to qualified audio has a time which differs from a time for the current frame by less than a first intervening duration threshold, and the status of the most recently created speech block is pending;
orB) the most recent preceding frame corresponding to qualified audio has a time which differs from the time for the current frame by less than a second intervening duration threshold, and the status of the most recently created speech block is committed;
ii) if the current frame is not added to the most recently created speech block, and the current frame corresponds to qualified audio, executing code configured to create a new speech block, wherein;
A) the start time for the new speech block is the time for the current frame; and
B) the status for the new speech block is pending;
iii) if the status of the most recently created speech block is pending, executing code configured to change the status of the most recently created speech block to discarded if and only if;
A) the current frame does not correspond to qualified audio;
B) the most recent preceding frame corresponding to qualified audio has a time which differs from the time for the current frame by more than the first intervening duration threshold; and
C) the status of the most recently created speech block is pending;
iv) if the status of the most recently created speech block is pending, executing code configured to change the status of the most recently created speech block to committed if and only if;
A) the current frame corresponds to qualified audio; and
B) the start time for the most recently created speech block speech block precedes the time for the current frame by more than a minimum block duration threshold;
c) presenting a speech block interface to a user, wherein;
i) the speech block interface displays, for each audio input data stream, a timeline of speech blocks for the audio input data stream, the timeline being updated in real time as the qualified audio for the audio input data streams is identified;
ii) the speech block interface is configured to allow the user play a portion of an audio input data stream corresponding to a speech block by selecting the speech block to be played;
iii) the speech block interface is configured to allow the user to skip from each displayed speech block to a previous or subsequent displayed speech block; and
iv) the speech block interface is configured not to display discarded speech blocks, to display pending speech blocks semitransparently, and to display committed speech blocks opaquely.
1 Assignment
0 Petitions
Accused Products
Abstract
A clear picture of who is speaking in a setting where there are multiple input sources (e.g., a conference room with multiple microphones) can be obtained by comparing input channels against each other. The data from each channel can not only be compared, but can also be organized into portions which logically correspond to statements by a user. These statements, along with information regarding who is speaking, can be presented in a user friendly format via an interactive timeline which can be updated in real time as new audio input data is received.
65 Citations
2 Claims
-
1. A method comprising:
-
a) using a computer, identifying qualified audio on each of a plurality of audio input data streams by; i) identifying any unique signals on any of the plurality of audio input data streams which exceed an amplitude threshold as qualified audio; and ii) when similar signals exceeding the amplitude threshold are detected on multiple audio input data streams, identifying only the loudest of the similar signals as qualified audio; b) for each of the audio input data streams, identifying a set of speech blocks, each of which has a status and a start time, by, for each frame in the audio input data stream; i) executing code configured to add the current frame to a most recently created speech block if and only if; A) the most recent preceding frame corresponding to qualified audio has a time which differs from a time for the current frame by less than a first intervening duration threshold, and the status of the most recently created speech block is pending;
orB) the most recent preceding frame corresponding to qualified audio has a time which differs from the time for the current frame by less than a second intervening duration threshold, and the status of the most recently created speech block is committed; ii) if the current frame is not added to the most recently created speech block, and the current frame corresponds to qualified audio, executing code configured to create a new speech block, wherein; A) the start time for the new speech block is the time for the current frame; and B) the status for the new speech block is pending; iii) if the status of the most recently created speech block is pending, executing code configured to change the status of the most recently created speech block to discarded if and only if; A) the current frame does not correspond to qualified audio; B) the most recent preceding frame corresponding to qualified audio has a time which differs from the time for the current frame by more than the first intervening duration threshold; and C) the status of the most recently created speech block is pending; iv) if the status of the most recently created speech block is pending, executing code configured to change the status of the most recently created speech block to committed if and only if; A) the current frame corresponds to qualified audio; and B) the start time for the most recently created speech block speech block precedes the time for the current frame by more than a minimum block duration threshold; c) presenting a speech block interface to a user, wherein; i) the speech block interface displays, for each audio input data stream, a timeline of speech blocks for the audio input data stream, the timeline being updated in real time as the qualified audio for the audio input data streams is identified; ii) the speech block interface is configured to allow the user play a portion of an audio input data stream corresponding to a speech block by selecting the speech block to be played; iii) the speech block interface is configured to allow the user to skip from each displayed speech block to a previous or subsequent displayed speech block; and iv) the speech block interface is configured not to display discarded speech blocks, to display pending speech blocks semitransparently, and to display committed speech blocks opaquely.
-
-
2. A method comprising:
-
a) filtering a plurality of audio input data streams by, for any overlapping period wherein an overlapping period is a period in which signals differing only in volume are included in two or more audio input data streams from the plurality of audio input data streams, exclude the signal from each of the two or more audio input data streams except for the loudest of the two or more audio input data streams for the overlapping period; b) for each of the filtered audio input data streams, defining a set of speech blocks using a computer, wherein, for each speech block for the filtered audio input data stream, i) the speech block comprises a base period, wherein the base period has a duration longer than a first duration threshold, and wherein the filtered audio input data stream'"'"'s volume exceeds a volume threshold throughout the base period except for lapses corresponding to normal pauses between words in speech; ii) the speech block comprises a set of additional periods, wherein, for each of the additional periods; A) the filtered audio input data stream'"'"'s volume exceeds the volume threshold throughout the additional period except for lapses corresponding to normal pauses between words in speech; and B) if the additional period has a duration longer than the first duration threshold, there is no intervening period between the additional period'"'"'s end and the base period'"'"'s start which has a duration longer than a second duration threshold and which has a volume which is lower than the volume threshold during the intervening period; C) if the additional period has a duration which is not longer than the first duration threshold, there is no intervening period between the additional period'"'"'s end and the base period'"'"'s start which has a duration longer than third duration threshold and which has a volume which is lower than the volume threshold during the intervening period, wherein the third duration threshold is shorter than the second duration threshold; iii) the start of the earliest period comprised by the speech block is defined as the speech block'"'"'s start; and iv) the end of the latest period comprised by the speech block is defined as the speech block'"'"'s end; c) presenting a speech block interface to a user, wherein; i) the speech block interface displays, for each audio input data stream, a timeline of each speech block from the audio input data stream, the timeline being updated in real time as the audio input data streams are received; ii) the speech block interface is configured to allow a user play a portion of an audio input data stream corresponding to a speech block from the audio input data stream by selecting the speech block to be played; and iii) the speech block interface is configured to allow the user to skip from each block to a previous or subsequent speech block.
-
Specification