METHOD AND APPARATUS OF VOICE MIXING FOR CONFERENCING AMONGST DIVERSE NETWORKS

US 20070299661A1
Filed: 11/29/2006
Published: 12/27/2007
Est. Priority Date: 11/29/2005
Status: Active Grant

First Claim

Patent Images

1. An apparatus for performing voice mixing of multiple inputs from multiple source bit-streams representing frames of data from a plurality of source channels, each of the plurality of source channels being connected to a conference and encoded according to a codec employed by each of the plurality of source channels, the apparatus comprising:

a bit-stream un-packer for each of the plurality of source channels, each of the plurality of source channels being connected to a mixing system;

a voice activity detection module for each of the plurality of source channels, wherein the voice activity detection module is adapted to determine if an input channel is active;

a decision module adapted to determine if an output on a first channel of the plurality of source channels connected to the conference should be obtained through time domain mixing of time domain signals associated with other channels of the plurality of source channels or through fast transcoding of one of the other channels of the plurality of source channels;

a switch module adapted to connect an input from one of the plurality of source channels to at least one of an interpolator module or a time domain mixing module based on the determined output;

an interpolator module between each of the plurality of source channels and adapted to allow speech compression parameters produced by one speech compression algorithm to cover a given time period and to represent a time period that another speech compression algorithm utilizes;

a time domain mixing module for each of the plurality of source channels, wherein the time domain mixing module is adapted to produce a time domain signal that represents a combination of the time domain signals associated with other channels of the plurality of source channels; and

a pack module for each of the plurality of source channels, wherein the pack module is adapted to provide a resultant conference signal in a format associated with an output of at least one of the plurality of source channels.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A conferencing system is provided that utilizes both time domain signal mixing and direct signal fast transcoding. An exemplary embodiment of the present invention utilizes both time domain signal mixing and direct signal fast transcoding to process a bit-stream from a same channel during a conference.

37 Citations

View as Search Results

39 Claims

1. An apparatus for performing voice mixing of multiple inputs from multiple source bit-streams representing frames of data from a plurality of source channels, each of the plurality of source channels being connected to a conference and encoded according to a codec employed by each of the plurality of source channels, the apparatus comprising:
- a bit-stream un-packer for each of the plurality of source channels, each of the plurality of source channels being connected to a mixing system;
  
  a voice activity detection module for each of the plurality of source channels, wherein the voice activity detection module is adapted to determine if an input channel is active;
  
  a decision module adapted to determine if an output on a first channel of the plurality of source channels connected to the conference should be obtained through time domain mixing of time domain signals associated with other channels of the plurality of source channels or through fast transcoding of one of the other channels of the plurality of source channels;
  
  a switch module adapted to connect an input from one of the plurality of source channels to at least one of an interpolator module or a time domain mixing module based on the determined output;
  
  an interpolator module between each of the plurality of source channels and adapted to allow speech compression parameters produced by one speech compression algorithm to cover a given time period and to represent a time period that another speech compression algorithm utilizes;
  
  a time domain mixing module for each of the plurality of source channels, wherein the time domain mixing module is adapted to produce a time domain signal that represents a combination of the time domain signals associated with other channels of the plurality of source channels; and
  
  a pack module for each of the plurality of source channels, wherein the pack module is adapted to provide a resultant conference signal in a format associated with an output of at least one of the plurality of source channels.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The apparatus of claim 1 wherein the bit-stream un-packer includes:
    - a bit-stream data type identifier adapted to receive an input from a bit-stream frame of data encoded by a voice codec according to a voice compression standard and adapted to output a data type of the packet; and
      
      a source bit-stream payload data unquantizer adapted to dequantize codes representing one or more speech compression parameters.
  - 3. The apparatus of claim 2 wherein the source bit-stream payload data unquantizer comprises:
    - a code separator, the code separator being operative to receive input from a bit-stream frame of data encoded at a data rate according to a voice compression standard and to interpret the codes representing the one or more speech parameters;
      
      at least one dequantizer module operative to dequantize the codes representing the one or more speech compression parameters; and
      
      a code index pass-through module operative to pass input codes representing the one or more speech compression parameters to following stages.
  - 4. The apparatus of claim 1 wherein the voice activity detection module includes:
    - a silence frame detection state machine adapted to store a voice activity status of several past frames;
      
      a silence frame indicator adapted to indicate a silence status of a current frame from one or more speech compression parameters carried by one of the multiple source bit-streams; and
      
      a voice activity detector adapted to perform a voice activity computation from unpacked speech parameters and output the voice activity status.
  - 5. The apparatus of claim 1 wherein the decision module comprises:
    - an activity weighting module operative to weight a voice activity status of one or more source channels of the plurality of source channels according to a set of one or more tuning weights assigned by a system hosting the conference or a participant in the conference;
      
      a weighted activity filter operative to combine the weighted voice activity status of the one or more source channels of the plurality of source channels;
      
      a decision extractor operative to produce a flag indicating a conference method to be used to produce the output on the first channel of the plurality of source channels; and
      
      a source channel allocater operative to use the filtered weighted voice activity status of the one or more source channels of the plurality of source channels and a source allocation scheme to determine which of the plurality of source channels will contribute to the output of the first channel of the plurality of source channels.
  - 6. The apparatus of claim 5 wherein the set of one or more tuning weights are assigned automatically.
  - 7. The apparatus of claim 1 wherein the time domain mixing module includes:
    - a plurality of signal reconstruction modules, each of the signal reconstruction module being associated with each of the plurality of source channels and adapted to provide a time domain digital speech signal using a set of parameters describing a compression method used on the first channel of the plurality of source channels and another set of parameters obtained from the bit-stream un-packer for the first channel of the plurality of source channels; and
      
      a mixer module adapted to combine time domain digital speech signals produced by the plurality of signal reconstruction modules.
  - 8. The apparatus of claim 7 further comprising:
    - an optional scaling module adapted to normalize the time domain digital speech signal to avoid overflow; and
      
      an optional signal adjustment module adapted to allow the time domain digital speech signals to be modified before being combined.
  - 9. The apparatus of claim 7 wherein each of the plurality of signal reconstruction modules further includes a re-sampling module adapted to convert wideband digital speech signals to narrow band digital speech signals or to convert narrow-band digital speech signals to wide-band digital speech signals.
  - 10. The apparatus of claim 1 wherein the interpolator module includes:
    - a CELP parameters interpolation module adapted to interpolate LSPs, adaptive codebook parameters, and fixed codebook parameters to represent different length speech frames or to define speech frames using a different combination of these parameters to that presented by CELP parameters operated on;
      
      a bandwidth adjustment module adapted to convert narrow-band parameters to wide-band parameters and wide-band parameters to narrow-band parameters;
      
      a pass-through module if a source channel speech compression method and an output channel speech compression method are the same;
      
      a non-CELP to CELP parameter interpolation module adapted to convert non-CELP compression parameters into a set of CELP parameters if the source channel compression method is a non-CELP type compression method and the output channel compression method is a CELP type compression method;
      
      a CELP to non-CELP parameter interpolation module adapted to convert CELP parameters to non-CELP parameters if the source channel compression method is a CELP type compression method and the output channel compression method is a non-CELP compression method; and
      
      a CELP parameter buffer adapted to;
      
      store one or more CELP parameters that are not interpolated; and
      
      hold the one or more CELP parameters that are not interpolated until there is a difference between the source channel compression method and the output channel compression method.
  - 11. The apparatus of claim 10 wherein the bandwidth adjustment module includes:
    - an LPC conversion module adapted to extend narrow-band LPC to wideband LPC;
      
      an up-sampling module adapted to convert time sampled parameters from narrow-band to wide-band; and
      
      a CELP parameter equivalent conversion module adapted to interpolate other CELP parameters from narrow-band to wide-band.
  - 12. The apparatus of claim 10 wherein the bandwidth adjustment module includes:
    - an LPC conversion module adapted to convert wide-band LPC to narrow-band LPC;
      
      a down-sampling module adapted to convert time sampled parameters from wide-band to narrow-band; and
      
      a CELP parameter equivalent conversion module adapted to interpolate other CELP parameters from wide-band to narrow-band.
  - 13. The apparatus of claim 1 wherein the pack module includes a tuning module comprising:
    - a decision module adapted to select a destination compression method parameter mapping and a tuning strategy based upon a plurality of strategies;
      
      a tuning module adapted to output one or more destination CELP parameters if an output channel compression method is a CELP type speech compression method; and
      
      a non-CELP type tuning module adapted to output the one or more destination CELP compression parameters if the output channel compression method is a non-CELP type speech compression method.
  - 14. The apparatus of claim 1 wherein the pack module comprises a plurality of frame packing facilities, each of the plurality of frame packing facilities being capable of adapting to a pre-selected application from a plurality of applications for a selected destination voice coder, the selected destination voice coder being one of a plurality of voice coders.
  - 15. The apparatus of claim 1 wherein the source bit-stream represents CELP parameters.
  - 16. The apparatus of claim 1 wherein the source bit-stream represents narrow-band speech.
  - 17. The apparatus of claim 1 wherein the source bit-stream represents wide-band speech.
  - 18. The apparatus of claim 1 wherein the apparatus is configurable to allow more than two bit-streams to be accepted as source bit-streams.
  - 19. The apparatus of claim 1 wherein a compression method used on any of the plurality of source channels comprises at least one of a parametric speech compression method, a waveform-approximating speech compression method, or a waveform compression method.
  - 20. The apparatus of claim 1 wherein the compression method used on any of the plurality of source channels comprises any speech or audio compression method.
  - 21. The apparatus of claim 1 wherein a source channel compression method and an output channel compression method include wide-band and narrow-band methods.
  - 22. The apparatus of claim 1 wherein the format associated with the output comprises a compression algorithm.

23. A method for performing voice mixing of multiple inputs from multiple source bit-streams representing frames of data from a plurality of source channels, each of the plurality of source channels being connected to a conference and encoded according to a codec employed by each of the plurality of source channels, the method comprising:
- un-packing input compression codes from the multiple source bit-streams, wherein the multiple source bit-streams represent encoded signals;
  
  detecting a voice activity present on each of the plurality of source channels for a pre-set time period in an adaptable manner;
  
  reconstructing time domain signals from voice active input source bit-streams that are from source channels other than a first output channel of the plurality of source channels;
  
  mixing the reconstructed time domain signals into a mixed output signal;
  
  generating compression codes representing the mixed output signal;
  
  interpolating input compression codes from a single voice active bit-stream from a first source channel to output compression codes to be placed on a second channel of the plurality of source channels connected to the conference when only a single source channel, other than the second, is detected to have voice activity; and
  
  packing the output compression codes in an output bit-stream formatted to represent frames of data to be placed on a channel of the plurality of source channels.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
- - 24. The method of claim 23 wherein un-packing the input compression codes comprises:
    - converting an input bit-stream frame into information associated with one or more speech parameters;
      
      decoding the information into one or more speech parameters; and
      
      reconstructing time domain speech samples and parameters based on the one or more speech parameters.
  - 25. The method of claim 24 wherein the speech parameters are CELP parameters if the codec employs a CELP voice compression method.
  - 26. The method of claim 24 wherein the speech parameters are non-CELP parameters if the codec employs a non-CELP voice compression method.
  - 27. The method of claim 23 wherein detecting a voice activity comprises:
    - determining if an input bit-stream carries a voice activity indicator for each time frame represented by the input bit-stream;
      
      reconstructing a time domain signal from the un-packed input compression codes if the input bit-stream does not carry a voice activity indicator;
      
      processing the time domain signal, if reconstructed, to determine if the time domain signal has voice activity for the time frame and generating the voice activity indicator for the time frame; and
      
      using voice activity indicators of multiple consecutive time frames to set or clear an active flag indicating if each of the plurality of source channels has voice activity.
  - 28. The method of claim 23 wherein interpolating input compression codes comprises:
    - interpolating CELP parameters including LSPs, adaptive codebook, and fixed codebook parameters according to an output channel CELP format and frame size;
      
      converting CELP parameters from narrow-band to wide-band if the input compression codes represent a narrow-band signal and the output compression codes are to represent a wide-band signal;
      
      converting CELP parameters from wide-band to narrow-band if the input compression codes represent a wide-band signal and the output compression codes are to represent a narrow-band signal;
      
      converting the input compression codes to CELP compression codes if the input compression codes are not CELP compression codes and the output compression codes are to be formatted as CELP compression codes;
      
      converting the input compression codes from CELP compression codes to non-CELP compression codes if the output compression codes are to be formatted as non-CELP compression codes;
      
      directly passing through the input compression codes as the output compression codes if the output channel carries the same type of compression codes as the input compression codes; and
      
      storing speech parameters used for interpolation in a next time frame into a buffer.
  - 29. The method of claim 28 wherein converting CELP parameters from wide-band to narrow-band comprises:
    - converting LPC coefficients from a wide-band representation to a narrow-band representation;
      
      bandwidth limiting and down-sampling time sampled parameters from wide-band to narrow-band; and
      
      interpolating all other CELP parameters in wide-band form to narrow-band form.
  - 30. The method of claim 28 wherein converting CELP parameters from narrow-band to wideband comprises:
    - converting LPC coefficients from a narrow-band representation to a wide-band representation;
      
      band-limiting and up-sampling time sampled parameters from narrow-band to wideband; and
      
      interpolating all other CELP parameters in narrow-band form to wide-band form.
  - 31. The method of claim 23 wherein mixing the reconstructed time domain signals comprises:
    - reconstructing time sampled speech parameters from the un-packed input compression codes;
      
      modifying the reconstructed speech parameters according to a control input;
      
      regenerating speech signals from the unpacked and reconstructed parameters if required; and
      
      mixing sample-based speech parameters from multiple source inputs to produce a combined time-sampled set of parameters.
  - 32. The method of claim 23 wherein generating compression codes comprises:
    - quantizing all destination speech codec parameters in a target code space; and
      
      generating silence description frames that use less bits than normal coded speech frames when only silence is to be transmitted to the output bit-stream.
  - 33. The method of claim 23 wherein packing the output compression codes comprises:
    - determining a format to be used for a first channel of the plurality of channels connected to the conference; and
      
      formatting the generated compression codes according to the determined format.

34. A conferencing system adapted to conference a number of channels such that no restrictions are placed on the type of compression used by any of the channels in that the system includes modules that can unpack bit-streams of numerous compression standards.
- View Dependent Claims (35, 36)
- - 35. The conferencing system of claim 34 wherein the type of compression comprises parametric speech compression methods, waveform-approximating methods, waveform compression methods, and audio compression methods.
  - 36. The conferencing system of claim 34 wherein the type of compression comprises narrow-band compression and wide-band compression.

37. A conferencing system that utilizes both time domain signal mixing and direct signal fast transcoding.
- View Dependent Claims (38)
- - 38. The conferencing system of claim 37 wherein both time domain signal mixing and direct signal fast transcoding are utilized to process a bit-stream from a same channel during a conference.

39. A conferencing system that allows a session which performs transcoding in code space to become a conferencing session and vice versa without the need for the conferencing and transcoding functionalities to be split between different systems.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Onmobile Global Limited
Original Assignee
Dilithium Networks
Inventors
Wang, Jianwei, Jabri, Marwan, Raad, Mohammed

Granted Patent

US 7,599,834 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/221
CPC Class Codes

H04M 3/568 audio processing specific t...

METHOD AND APPARATUS OF VOICE MIXING FOR CONFERENCING AMONGST DIVERSE NETWORKS

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS OF VOICE MIXING FOR CONFERENCING AMONGST DIVERSE NETWORKS

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links