Spatialization arrangement for conference call

US 20070025538A1
Filed: 07/11/2005
Published: 02/01/2007
Est. Priority Date: 07/11/2005
Status: Active Grant

First Claim

Patent Images

1. A method for distinguishing speakers in a conference call of a plurality of participants, the method comprising:

receiving speech frames of the conference call, said speech frames including encoded speech parameters;

examining at least one speech parameter of the received speech frames; and

classifying the speech frames to belong to one of the participants, the classification being carried out according to differences in the examined at least one speech parameter.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for distinguishing speakers in a conference call of a plurality of participants, in which method speech frames of the conference call are received in a receiving unit, which speech frames include encoded speech parameters. At least one parameter of the received speech frames is examined in an audio codec of the receiving unit, and the speech frames are classified to belong to one of the participants, the classification being carried out according to differences in the examined at least one speech parameter. These functions may be carried out in a speaker identification block, which is applicable in various positions of a teleconferencing processing chain. Finally, a spatialization effect is created in a terminal reproducing the audio signal according to notified differences by placing the participants at distinct positions in an acoustical space of the audio signal.

72 Citations

View as Search Results

32 Claims

1. A method for distinguishing speakers in a conference call of a plurality of participants, the method comprising:
- receiving speech frames of the conference call, said speech frames including encoded speech parameters;
  
  examining at least one speech parameter of the received speech frames; and
  
  classifying the speech frames to belong to one of the participants, the classification being carried out according to differences in the examined at least one speech parameter.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, the method further comprising:
    - creating a spatialization effect to an audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal based on the speech frame classification of the participants.
  - 3. The method according to claim 1, the method further comprising:
    - determining a control word for each participant according to differences in the examined at least one speech parameter; and
      
      attaching control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame.
  - 4. The method according to claim 3, the method further comprising:
    - creating a spatialization effect on the basis of the control words attached to speech frames.
  - 5. The method according to claim 4, the method further comprising:
    - determining the control word for each participant according to differences in the examined only one speech parameter; and
      
      controlling spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 6. The method according to claim 4, the method further comprising:
    - clustering the speech frames according to differences in a plurality of examined speech parameters;
      
      determining the control word for each participant according to differences in the speech parameters of the clustered speech frames; and
      
      controlling spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 7. The method according to claim 1, wherein the examined speech parameters include at least one of the following:
    - the pitch of the voice;
      
      a voicing classification of a speech frame;
      
      any LPC parameter of a speech frame.

8. A system for distinguishing speakers in a conference call with a plurality of participants, the system comprising:
- means for receiving speech frames of the conference call, said speech frames including encoded speech parameters;
  
  means for examining at least one parameter of the received speech frames; and
  
  means for classifying the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system according to claim 8, further comprising:
    - a spatialization means for creating a spatialization effect to the audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal.
  - 10. The system according to claim 8, further comprising:
    - means for determining a control word for each participant according to differences in the examined at least one speech parameter; and
      
      means for attaching control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame.
  - 11. The system according to claim 9, wherein the spatialization means are arranged to create the spatialization effect on the basis of the control words attached to speech frames.
  - 12. The system according to claim 11, wherein the means for determining the control word for each participant are arranged to examine only one speech parameter and to define the control word according to linear differences of said speech parameter;
    - and wherein the system further comprises means for controlling spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 13. The system according to claim 11, further comprising:
    - means for clustering the speech frames according to differences in a plurality of examined speech parameters;
      
      means for determining the control word for each participant according to differences in the speech parameters of the clustered speech frames; and
      
      means for controlling spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 14. The system according to claim 8, wherein the examined speech parameters include at least one of the following:
    - the pitch of the voice;
      
      a voicing classification of a speech frame any LPC parameter of a speech frame.

15. A terminal device for a three-dimensional spatialization of an audio signal of a conference call with a plurality of participants, the device comprising:
- means for receiving speech frames of the conference call, said speech frames including encoded speech parameters;
  
  means for examining at least one parameter of the received speech frames;
  
  means for classifying the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter; and
  
  a spatialization means for creating a spatialization effect to the audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal.
- View Dependent Claims (16, 17, 20)
- - 16. The terminal device according to claim 15, further comprising:
    - a stereo or a multi-channel audio reproduction means.
  - 17. The terminal device according to claim 15, further comprising:
    - means for displaying a speaker identification of the participant to whom the concurrent speech frames are classified to belong.
  - 20. The computer program product according to claim 17, wherein the computer program code section for creating a spatialization effect further comprises a computer program code section for creating the spatialization effect on the basis of the control words attached to speech frames.

18. A computer program product, stored on a computer readable medium and executable in a data processing device, for a three-dimensional spatialization of an audio signal of a conference call with a plurality of participants, the computer program product comprising:
- a computer program code section for receiving speech frames of the conference call, said speech frames including encoded speech parameters;
  
  a computer program code section for examining at least one parameter of the received speech frames;
  
  a computer program code section for classifying the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter; and
  
  a computer program code section for creating a spatialization effect to the audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal.
- View Dependent Claims (19)
- - 19. The computer program product according to claim 18, further comprising:
    - a computer program code section for determining a control word for each participant according to differences in the examined at least one speech parameter; and
      
      a computer program code section for attaching control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame.

21. A conference bridge for a teleconferencing system, the bridge comprising:
- means for receiving speech frames of the conference call with a plurality of participants, said speech frames including encoded speech parameters;
  
  means for examining at least one parameter of the received speech frames;
  
  means for classifying the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter; and
  
  means for including information based on the speech frame classification of the participants in an audio signal for a further spatialization processing of the audio signal.
- View Dependent Claims (22, 23, 24, 25, 26)
- - 22. The conference bridge according to claim 21, wherein said means for including information based on the speech frame classification of the participants are arranged to determine a control word for each participant according to differences in the examined at least one speech parameter.
  - 23. The conference bridge according to claim 22, further comprising:
    - a spatialization means for creating a spatialization effect to the audio signal to be transmitted to the participants by placing the participants at distinct positions in an acoustical space of the audio signal according to the control words, and an encoder for encoding the spatialized audio signal prior to the transmission.
  - 24. The conference bridge according to claim 22, further comprising:
    - means for attaching the control words into the audio signal to be transmitted as an additional control information for a further spatialization processing of the audio signal in a receiving terminal.
  - 25. The conference bridge according to claim 24, wherein said additional control information is attached into the audio signal according to one of the following methods:
    - embedding the control words into the audio signal;
      
      stealing particular bits of a speech frame of the audio signal for indicating the control word;
      
      inserting the control words into unused control fields of a transport protocol used for transmitting the audio signal;
      
      or transmitting the control words in a separate control signal along with the audio signal.
  - 26. The conference bridge according to claim 22, further comprising:
    - means for creating separate audio signals, each signal representing speech of a participant;
      
      means for directing a speech frame of an actively speaking participant, indicated by the control word of said speech frame, to a separated audio signal of said participant;
      
      means for generating a silent frame to the separated audio signals of other participant for the duration of said speech frame; and
      
      means for transmitting said separate audio signals to each of the participants.

27. A computer program product, stored on a computer readable medium and executable in a data processing device, for distinguishing speakers in a conference call with a plurality of participants, the computer program product comprising:
- a computer program code section for receiving speech frames of the conference call, said speech frames including encoded speech parameters;
  
  a computer program code section for examining at least one parameter of the received speech frames;
  
  a computer program code section for classifying the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter; and
  
  a computer program code section for including information based on the speech frame classification of the participants in an audio signal for a further spatialization processing of the audio signal.

28. A terminal device for operating as a master terminal connecting a plurality of slave terminals to a conference bridge, the terminal device comprising:
- means for receiving speech frames of the conference call with a plurality of participants, said speech frames including encoded speech parameters;
  
  an audio codec for examining at least one parameter of the received speech frames;
  
  means for classifying the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter; and
  
  means for including information based on the speech frame classification of the participants in an audio signal for a further spatialization processing of the audio signal.
- View Dependent Claims (29, 30, 31, 32)
- - 29. The terminal device according to claim 28, wherein said means for including information based on the speech frame classification of the participants are arranged to determine a control word for each participant according to differences in the examined at least one speech parameter.
  - 30. The terminal device according to claim 28, further comprising:
    - means for attaching the control words into the audio signal to be transmitted as an additional control information for a further spatialization processing of the audio signal in slave terminals.
  - 31. The terminal device according to claim 28, further comprising:
    - means for creating separate audio signals, each signal representing speech of a participant;
      
      means for directing a speech frame of an actively speaking participant, indicated by the control word of said speech frame, to a separated audio signal of said participant;
      
      means for generating a silent frame to the separated audio signals of other participant for the duration of said speech frame; and
      
      means for transmitting said separate audio signals to each slave terminal.
  - 32. The terminal device according to claim 28, further comprising:
    - a low power RF means for establishing connections to said slave terminals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Corporation
Original Assignee
Nokia Corporation
Inventors
Virolainen, Jussi, Jarske, Petri

Granted Patent

US 7,724,885 B2
Time in Patent Office

Days
Field of Search
US Class Current

379/202.10
CPC Class Codes

H04M 1/6016   in the receiver circuit

H04M 2201/41   using speaker recognition s...

H04M 3/56   Arrangements for connecting...

Spatialization arrangement for conference call

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

72 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Spatialization arrangement for conference call

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

72 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links