Spatialization arrangement for conference call

US 7,724,885 B2
Filed: 07/11/2005
Issued: 05/25/2010
Est. Priority Date: 07/11/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

receiving speech frames of a conference call, said speech frames including encoded speech parameters;

examining at least one speech parameter of the received speech frames;

classifying the speech frames to belong to one of a plurality of participants in said conference call, the classification being carried out according to differences in the examined at least one speech parameter;

determining a control word for each participant according to differences in the examined at least one speech parameter; and

attaching control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for distinguishing speakers in a conference call of a plurality of participants, in which method speech frames of the conference call are received in a receiving unit, which speech frames include encoded speech parameters. At least one parameter of the received speech frames is examined in an audio codec of the receiving unit, and the speech frames are classified to belong to one of the participants, the classification being carried out according to differences in the examined at least one speech parameter. These functions may be carried out in a speaker identification block, which is applicable in various positions of a teleconferencing processing chain. Finally, a spatialization effect is created in a terminal reproducing the audio signal according to notified differences by placing the participants at distinct positions in an acoustical space of the audio signal.

Citations

27 Claims

1. A method comprising:
- receiving speech frames of a conference call, said speech frames including encoded speech parameters;
  
  examining at least one speech parameter of the received speech frames;
  
  classifying the speech frames to belong to one of a plurality of participants in said conference call, the classification being carried out according to differences in the examined at least one speech parameter;
  
  determining a control word for each participant according to differences in the examined at least one speech parameter; and
  
  attaching control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method according to claim 1, the method further comprising:
    - creating a spatialization effect to an audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal based on the speech frame classification of the participants.
  - 3. The method according to claim 1, the method further comprising:
    - creating a spatialization effect on the basis of the control words attached to speech frames.
  - 4. The method according to claim 3, the method further comprising:
    - determining the control word for each participant according to differences in the examined only one speech parameter; and
      
      controlling spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 5. The method according to claim 3, the method further comprising:
    - clustering the speech frames according to differences in a plurality of examined speech parameters;
      
      determining the control word for each participant according to differences in the speech parameters of the clustered speech frames; and
      
      controlling spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 6. The method according to claim 1, wherein the examined speech parameters include at least one of the following:
    - the pitch of the voice;
      
      a voicing classification of a speech frame;
      
      any LPC parameter of a speech frame.

7. A system comprising:
- a receiving unit configured to receive speech frames of a conference call, said speech frames including encoded speech parameters;
  
  a decoder configured to examine at least one parameter of the received speech frames;
  
  a recognition block configured to classify the speech frames to belong to one of a plurality of participants in said conference call, the classification being based on differences in the examined at least one speech parameter and to determine a control word for each participant according to differences in the examined at least one speech parameter; and
  
  a spatialization processing module configured to attach control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system according to claim 7, whereinthe spatialization processing module is configured to create a spatialization effect to the audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal.
  - 9. The system according to claim 8, whereinthe spatialization processing module is further configured to create the spatialization effect on the basis of control words attached to speech frames by said spatialization processing module, wherein said control words are configured to be determined by said recognition block.
  - 10. The system according to claim 9, whereinsaid recognition block configured to determine the control word for each participant is configured to examine only one speech parameter and to define the control word according to linear differences of said speech parameter;
    - and wherein the system further comprisessaid spatialization processing module further configured to control spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 11. The system according to claim 9, wherein said recognition block is configured to cluster the speech frames according to differences in a plurality of examined speech parameters and to determine the control word for each participant according to differences in the speech parameters of the clustered speech frames;
    - andsaid spatialization processing module is configured to control spatial positions of audio channels of the audio signal to be reproduced according to the control words.
  - 12. The system according to claim 7, wherein the examined speech parameters include at least one of the following:
    - the pitch of the voice;
      
      a voicing classification of a speech frameany LPC parameter of a speech frame.

13. A terminal device comprising:
- a receiving unit configured to receive speech frames of a conference call, said speech frames including encoded speech parameters;
  
  a decoder configured to examine at least one parameter of the received speech frames;
  
  a recognition block configured to classify the speech frames to belong to one of a plurality of participants in said conference call, the classification being based on differences in the examined at least one speech parameter and to determine a control word for each participant according to differences in the examined at least one speech parameter; and
  
  a spatialization processing module configured to attach control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame, andconfigured to create a three-dimensional spatialization effect to an audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal.
- View Dependent Claims (14, 15)
- - 14. The terminal device according to claim 13, further comprising:
    - a stereo or a multi-channel audio reproduction system.
  - 15. The terminal device according to claim 13, further comprising:
    - a display screen configured to display a speaker identification of the participant to whom the concurrent speech frames are classified to belong.

16. A computer readable medium stored with instructions, which when executed by a data processing device, performs:
- receiving speech frames of a conference call, said speech frames including encoded speech parameters;
  
  examining at least one parameter of the received speech frames;
  
  classifying the speech frames to belong to one of a plurality of participants in said conference call, the classification being based on differences in the examined at least one speech parameter;
  
  determining a control word for each participant according to differences in the examined at least one speech parameter;
  
  attaching control words to speech frames, the control word of each speech frame being characteristic to the participant speaking in the particular speech frame, andcreating a three-dimensional spatialization effect to the audio signal to be reproduced by placing the participants at distinct positions in an acoustical space of the audio signal.
- View Dependent Claims (17)
- - 17. The computer readable medium according to claim 16, whereincreating a spatialization effect is on the basis of the control words attached to speech frames.

18. A conference bridge for a teleconferencing system, the bridge comprising:
- a receiving unit configured to receive speech frames of a conference call with a plurality of participants, said speech frames including encoded speech parameters;
  
  a decoder configured to examine at least one parameter of the received speech frames; and
  
  a recognition block configured to classify the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter and to determine a control word for each participant according to differences in the examined at least one speech parameter; and
  
  to include information based on the speech frame classification of the participants in an audio signal for a spatialization processing of the audio signal.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The conference bridge according to claim 18, further comprising:
    - a spatialization processing module configured to create a spatialization effect to the audio signal to be transmitted to the participants by placing the participants at distinct positions in an acoustical space of the audio signal according to the control words, andan encoder configured to encode the spatialized audio signal prior to the transmission.
  - 20. The conference bridge according to claim 18, wherein said spatialization processing module is further configured to attach the control words into the audio signal to be transmitted as an additional control information for a further spatialization processing of the audio signal in a receiving terminal.
  - 21. The conference bridge according to claim 20, wherein said additional control information is attached into the audio signal according to one of the following methods:
    - embedding the control words into the audio signal;
      
      stealing particular bits of a speech frame of the audio signal for indicating the control word;
      
      inserting the control words into unused control fields of a transport protocol used for transmitting the audio signal;
      
      ortransmitting the control words in a separate control signal along with the audio signal.
  - 22. The conference bridge according to claim 18, further comprising:
    - said recognition block is configured to create separate audio signals, each signal representing speech of a participant;
      
      said spatialization processing module configured to direct a speech frame of an actively speaking participant, indicated by the control word of said speech frame, to a separated audio signal of said participant;
      
      a demultiplexer configured to generate a silent frame to the separated audio signals of other participant for the duration of said speech frame; and
      
      a transceiver configured to transmit said separate audio signals to each of the participants.

23. A computer readable medium stored with instructions, which when executed by a data processing device, performs:
- receiving speech frames of a conference call, said speech frames including encoded speech parameters;
  
  examining at least one parameter of the received speech frames;
  
  classifying the speech frames to belong to one of a plurality of participants in said conference call, the classification being based on differences in the examined at least one speech parameter;
  
  determining a control word for each participant according to differences in the examined at least one speech parameter; and
  
  including information based on the speech frame classification of the participants in an audio signal for a further spatialization processing of the audio signal.

24. A terminal device comprising:
- a receiving unit configured to receive speech frames of a conference call with a plurality of participants, said speech frames including encoded speech parameters;
  
  an audio decoder configured to examine at least one parameter of the received speech frames; and
  
  a recognition block configured to classify the speech frames to belong to one of the participants, the classification being based on differences in the examined at least one speech parameter;
  
  to include information based on the speech frame classification of the participants is configured to determine a control word for each participant according to differences in the examined at least one speech parameter; and
  
  to include information based on the speech frame classification of the participants in an audio signal for a further spatialization processing of the audio signal,wherein said terminal device operates as a master terminal connecting a plurality of slave terminals to a conference bridge.
- View Dependent Claims (25, 26, 27)
- - 25. The terminal device according to claim 24, further comprising:
    - a processing module configured to attach the control words into the audio signal to be transmitted as an additional control information for a further spatialization processing of the audio signal in slave terminals.
  - 26. The terminal device according to claim 24, further comprising:
    - said recognition block is configured to separate audio signals, each signal representing speech of a participant;
      
      a processing module is configured to direct a speech frame of an actively speaking participant, indicated by the control word of said speech frame, to a separated audio signal of said participant;
      
      a demultiplexer configured to generate a silent frame to the separated audio signals of other participant for the duration of said speech frame; and
      
      a transmitter configured to transmit said separate audio signals to each slave terminal.
  - 27. The terminal device according to claim 24, further comprising:
    - a transceiver for establishing connections to said slave terminals.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Corporation
Original Assignee
Nokia Corporation
Inventors
Jarske, Petri, Virolainen, Jussi
Primary Examiner(s)
DEANE JR, WILLIAM J

Application Number

US11/179,347
Publication Number

US 20070025538A1
Time in Patent Office

1,779 Days
Field of Search

379/202.01, 379/158, 379/207.01, 379/201.01
US Class Current

379/202.01
CPC Class Codes

H04M 1/6016   in the receiver circuit

H04M 2201/41   using speaker recognition

H04M 3/56   Arrangements for connecting...

Spatialization arrangement for conference call

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Spatialization arrangement for conference call

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links