Methods and systems for participant sourcing indication in multi-party conferencing and for audio source discrimination
First Claim
1. A method, comprising:
- via a first participant equipment;
detecting an acoustic signal;
determining whether the detected acoustic signal was generated by a person speaking by receiving a frame of audio data derived from the detected acoustic signal;
classifying the received frame based on spectral data of the received frame, the spectral data obtained by performing a modulated complex lapped transform (MOLT) on the frame of audio data, the classifying comprising classifying the received frame as one of the plurality of predetermined frame types comprising a live-type frame, a phone-type frame, and an unsure-type frame, wherein live-type frames represent frames determined to be derived from acoustic signals generated by a person speaking, and phone-type frames represent frames determined to be derived from acoustic signals generated by an audio transducer device; and
providing a signal indicating to a second participant equipment that the detected acoustic signal was generated by the person.
2 Assignments
0 Petitions
Accused Products
Abstract
Indications of which participant is providing information during a multi-party conference. Each participant has equipment to display information being transferred during the conference. A sourcing signaler residing in the participant equipment provides a signal that indicates the identity of its participant when this participant is providing information to the conference. The source indicators of the other participant equipment receive the signal and cause a UI to indicate that the participant identified by the received signal is providing information (e.g. the UI can causes the identifier to change appearance). An audio discriminator is used to distinguish between an acoustic signal that was generated by a person speaking from that generated in a band-limited manner. The audio discriminator analyzes the spectrum of detected audio signals and generates several parameters from the spectrum and from past determinations to determine the source of an audio signal on a frame-by-frame basis.
33 Citations
37 Claims
-
1. A method, comprising:
via a first participant equipment; detecting an acoustic signal; determining whether the detected acoustic signal was generated by a person speaking by receiving a frame of audio data derived from the detected acoustic signal; classifying the received frame based on spectral data of the received frame, the spectral data obtained by performing a modulated complex lapped transform (MOLT) on the frame of audio data, the classifying comprising classifying the received frame as one of the plurality of predetermined frame types comprising a live-type frame, a phone-type frame, and an unsure-type frame, wherein live-type frames represent frames determined to be derived from acoustic signals generated by a person speaking, and phone-type frames represent frames determined to be derived from acoustic signals generated by an audio transducer device; and providing a signal indicating to a second participant equipment that the detected acoustic signal was generated by the person. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
22. A computer-readable tangible medium having computer-executable instructions that, upon execution, facilitate a computing device in performing operations comprising:
-
detecting an acoustic signal; determining whether the detected acoustic signal was generated by a person speaking by receiving a frame of audio data derived from the detected acoustic signal; determining a source of the detected acoustic signal to be unsure, if; a prior determination of the source of the detected acoustic signal is that the source of the detected acoustic signal was an audio transducer device; the frame of audio data is classified as a live-type frame; a predetermined number of most recent frames do not include enough live-type frames to exceed a predetermined live-type frame count threshold; and an elapsed time since receiving a previous frame derived from speech exceeds a predetermined first time threshold;
orthe prior determination of the source of the detected acoustic signal is that the source was the acoustic signal generated by a person speaking; the frame of audio data is classified as a phone-type frame; an elapsed time since receiving a previous live-type frame exceeds a predetermined second time threshold; and a counter value is below a predetermined count threshold, the counter value to track a number of consecutive non-live-type frames received after receiving the live-type frame of most recent frames does not include enough live-type frames to exceed the predetermined count threshold; determining the source of the detected acoustic signal to be the acoustic signal generated by the person speaking, if; the prior determination of a source of a detected acoustic signal is unsure; the frame of audio data is classified as the live-type frame; and the predetermined number of most recent prior frames includes live-type frames that exceed in number the predetermined live-type frame count threshold; determining the source of the detected acoustic signal to be the audio transducer device, if; the prior determination of the source of the detected acoustic signal is unsure; the frame of audio data is classified as the phone-type frame; and the predetermined number of most recent prior frames includes phone-type frames that exceed in number a predetermined phone-type frame count threshold; classifying the received frame of audio data based on spectral data of the received frame of audio data, the classifying comprising classifying the received frame of audio data as one of a plurality of predetermined frame types comprising the live-type frame;
the phone-type frame; and
an unsure-type frame, wherein live-type frames represent frames determined to be derived from acoustic signals generated by the person speaking, and phone-type frames represent frames determined to be derived from acoustic signals generated by the audio transducer device, wherein parameters used to classify frames include high band noise floor energy, low band noise floor energy, frame high band energy, frame low band energy, a ratio of the frame high band energy to the frame low band energy;classifying the received frame of audio data from non-spectral data of the received frame based on a parameters energy ratio threshold for live speech and an energy ratio for phone speech; and providing a signal indicating to a second computing device that the detected acoustic signal was generated by a person. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
Specification