Method and apparatus of voice mixing for conferencing amongst diverse networks
First Claim
Patent Images
1. An apparatus for performing voice mixing of multiple inputs from multiple source bit-streams representing frames of data from a plurality of source channels, each of the plurality of source channels being connected to a conference and encoded according to a codec employed by each of the plurality of source channels, the apparatus comprising:
- a bit-stream un-packer for each of the plurality of source channels, each of the plurality of source channels being connected to a mixing system;
a voice activity detection module for each of the plurality of source channels, wherein the voice activity detection module is adapted to determine if an input channel is active;
a decision module adapted to determine if an output on a first channel of the plurality of source channels connected to the conference should be obtained through time domain mixing of time domain signals associated with other channels of the plurality of source channels or through fast transcoding of one of the other channels of the plurality of source channels;
a switch module adapted to connect an input from one of the plurality of source channels to at least one of an interpolator module or a time domain mixing module based on the determined output;
an interpolator module between each of the plurality of source channels and adapted to allow speech compression parameters produced by one speech compression algorithm to cover a given time period and to represent a time period that another speech compression algorithm utilizes;
a time domain mixing module for each of the plurality of source channels, wherein the time domain mixing module is adapted to produce a time domain signal that represents a combination of the time domain signals associated with other channels of the plurality of source channels; and
a pack module for each of the plurality of source channels, wherein the pack module is adapted to provide a resultant conference signal in a format associated with an output of at least one of the plurality of source channels.
5 Assignments
0 Petitions
Accused Products
Abstract
A conferencing system is provided that utilizes both time domain signal mixing and direct signal fast transcoding. An exemplary embodiment of the present invention utilizes both time domain signal mixing and direct signal fast transcoding to process a bit-stream from a same channel during a conference.
-
Citations
39 Claims
-
1. An apparatus for performing voice mixing of multiple inputs from multiple source bit-streams representing frames of data from a plurality of source channels, each of the plurality of source channels being connected to a conference and encoded according to a codec employed by each of the plurality of source channels, the apparatus comprising:
-
a bit-stream un-packer for each of the plurality of source channels, each of the plurality of source channels being connected to a mixing system; a voice activity detection module for each of the plurality of source channels, wherein the voice activity detection module is adapted to determine if an input channel is active; a decision module adapted to determine if an output on a first channel of the plurality of source channels connected to the conference should be obtained through time domain mixing of time domain signals associated with other channels of the plurality of source channels or through fast transcoding of one of the other channels of the plurality of source channels; a switch module adapted to connect an input from one of the plurality of source channels to at least one of an interpolator module or a time domain mixing module based on the determined output; an interpolator module between each of the plurality of source channels and adapted to allow speech compression parameters produced by one speech compression algorithm to cover a given time period and to represent a time period that another speech compression algorithm utilizes; a time domain mixing module for each of the plurality of source channels, wherein the time domain mixing module is adapted to produce a time domain signal that represents a combination of the time domain signals associated with other channels of the plurality of source channels; and a pack module for each of the plurality of source channels, wherein the pack module is adapted to provide a resultant conference signal in a format associated with an output of at least one of the plurality of source channels. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for performing voice mixing of multiple inputs from multiple source bit-streams representing frames of data from a plurality of source channels, each of the plurality of source channels being connected to a conference and encoded according to a codec employed by each of the plurality of source channels, the method comprising:
-
un-packing input compression codes from the multiple source bit-streams, wherein the multiple source bit-streams represent encoded signals; detecting a voice activity present on each of the plurality of source channels for a pre-set time period in an adaptable manner; reconstructing time domain signals from voice active input source bit-streams that are from source channels other than a first output channel of the plurality of source channels; mixing the time domain signals into a mixed output signal; generating output compression codes representing the mixed output signal; interpolating input compression codes from a single voice active bit-stream from a first source channel to the output compression codes to be placed on a second channel of the plurality of source channels connected to the conference when only a single source channel, other than the second, is detected to have voice activity; and packing the output compression codes in an output bit-stream formatted to represent frames of data to be placed on a channel of the plurality of source channels. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus for performing audio mixing in a conference call among at least a first participant, a second participant, and a third participant, wherein the first participant is associated with a first input channel formatted according to a first codec, the second participant is associated with a second input channel formatted according to a second codec, and the third participant is associated with a third output channel formatted according to a third codec, the apparatus comprising:
-
a first bitstream un-packer coupled to the first input channel, the first bit-stream un-packer being adapted to extract one or more first audio compression parameters of the first input channel; a second bitstream un-packer coupled to the second input channel, the second bitstream un-packer being adapted to extra one or more second audio compression parameters of the second input channel; a first voice activity detection module coupled to the first bitstream un-packer, the first voice activity detection module being adapted to determine if the first input channel is active; a second voice activity detection module coupled to the second bitstream un-packer, the second voice activity detection module being adapted to determine if the second input channel is active; a decision module coupled to the first voice activity detection module and the second voice activity detection module, the decision module being associated with the third output channel, the decision module being adapted to determine if the third output channel should be obtained through a time domain mixing of time domain signals associated with the first input channel and the second input channel, or through a first transcoding process from the first input channel to the third output channel, or through a second transcoding process from the second input channel to the third output channel; an interpolator module coupled to the decision module, the interpolator module being adapted to get one or more interpolated audio compression parameters by utilizing either the first transcoding process or the second transcoding process, wherein the one more interpolated audio compression parameters are associated with the third output channel; a time domain mixing module coupled to the decision module, the time domain mixing module being adapted to produce a time domain signal associated with the third output channel; and a pack module coupled to the decision module, the interpolator module, the time domain mixing module, and the third output channel, the pack module being adapted to provide a resultant conferencing signal in a format according to the third codec. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A method for performing audio mixing in a conference call among at least a first participant, a second participant, and a third participant, wherein the first participant is associated with a first input channel formatted according to a first codec, the second participant is associated with a second input channel formatted according to a second codec, and the third participant is associated with a third output channel formatted according to a third codec, the method comprising:
-
processing an input bit-stream received through the first input channel to produce one or more first audio compression parameters, and an input bit-stream received through the second input channel to produce one or more second audio compression parameters; detecting a first voice activity status on the first input channel and a second voice activity status on the second input channel; determining if the third output channel should be obtained through a time domain audio mixing of time domain signals associated with the first input channel and the second input channel, or through a transcoding process when only one of the first input channel and the second input channel is detected to have voice activity; providing one or more interpolated audio compression parameters from either interpolating the one or more first audio compression parameters, or interpolating the one or more second audio compression parameters; reconstructing a first time domain signal of the first input channel and a second time domain signal of the second input channel; mixing the first time domain signal and the second time domain signal into a mixed time domain signal; generating one or more mixed audio compression parameters from the mixed time domain signal; and packing the one or more interpolated audio compression parameters or the one or more mixed audio compression parameters to an output bit-stream in a format of the third codec. - View Dependent Claims (37, 38, 39)
-
Specification