Method and apparatus for active speaker selection using microphone arrays and speaker recognition

US 20090220065A1
Filed: 03/03/2008
Published: 09/03/2009
Est. Priority Date: 03/03/2008
Status: Active Grant

First Claim

Patent Images

1. A method for enabling active speaker selection during a teleconference, the active speaker to be selected from a plurality of active speakers participating in said teleconference and co-located at a given originating physical location, the selection of an active speaker to be made by one or more participants in the teleconference located at a remote physical location, the method comprising the steps of:

generating a plurality of estimated speech signals, each estimated speech signal comprising speech representative of a single one of said plurality of active speakers co-located at the given originating physical location;

performing speaker recognition on each of said estimated speech signals to generate corresponding speaker identities associated with the active speakers represented thereby;

transmitting a plurality of said speaker identities, each speaker identity corresponding to one of said estimated speech signals, to said remote physical location; and

transmitting one or more of said estimated speech signals to said remote physical location.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for performing active speaker selection in teleconferencing applications illustratively comprises a microphone array module, a speaker recognition system, a user interface, and a speech signal selection module. The microphone array module separates the speech signal from each active speaker from those of other active speakers, providing a plurality of individual speaker'"'"'s speech signals. The speaker recognition system identifies each currently active speaker using conventional speaker recognition/identification techniques. These identities are then transmitted to a remote teleconferencing location for display to remote participants via a user interface. The remote participants may then select one of the identified speakers, and the speech signal selection module then selects for transmission the speech signal associated with the selected identified speaker, thereby enabling the participants at the remote location to listen to the selected speaker and neglect the speech from other active speakers.

59 Citations

View as Search Results

20 Claims

1. A method for enabling active speaker selection during a teleconference, the active speaker to be selected from a plurality of active speakers participating in said teleconference and co-located at a given originating physical location, the selection of an active speaker to be made by one or more participants in the teleconference located at a remote physical location, the method comprising the steps of:
- generating a plurality of estimated speech signals, each estimated speech signal comprising speech representative of a single one of said plurality of active speakers co-located at the given originating physical location;
  
  performing speaker recognition on each of said estimated speech signals to generate corresponding speaker identities associated with the active speakers represented thereby;
  
  transmitting a plurality of said speaker identities, each speaker identity corresponding to one of said estimated speech signals, to said remote physical location; and
  
  transmitting one or more of said estimated speech signals to said remote physical location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 further comprising the step of receiving from said remote physical location, in response to said step of transmitting said plurality of speaker identities thereto, a selection of one of said speaker identities, and wherein the step of transmitting said one or more estimated speech signals to said remote physical location comprises transmitting only the estimated speech signal which corresponds to said selected speaker identity thereto.
  - 3. The method of claim 1 wherein the step of transmitting said one or more estimated speech signals to said remote physical location comprises transmitting all of the estimated speech signals corresponding to all of said transmitted plurality of speaker identities thereto.
  - 4. The method of claim 1 wherein the step of generating said plurality of estimated speech signals is performed with use of a plurality of microphones located at said given originating physical location and further with use of a multiple beam-forming microphone array processing technique.
  - 5. The method of claim 1 wherein the step of generating said plurality of estimated speech signals is performed with use of a plurality of microphones located at said given originating physical location and further with use of a beam scanning microphone array processing technique.
  - 6. The method of claim 1 wherein the step of performing speaker recognition on each of said estimated speech signals is performed with use of a pre-populated speaker database which comprises voice feature information and speaker identity information associated with each of a plurality of possible speakers, and wherein the speaker identities associated with the active speakers are generated by comparing voice feature information extracted from said estimated speech signals to said voice feature information comprised in said speaker database and associated with a plurality of said possible speakers included therein.
  - 7. The method of claim 6 wherein the voice feature information comprises one or more of short-term spectra information, long-term spectra information, signal energy information and fundamental frequency information.
  - 8. The method of claim 6 wherein the speaker identity information comprises one or more of names, titles, photos, positions and affiliations.

9. A method for performing active speaker selection during a teleconference, the active speaker to be selected from a plurality of active speakers participating in said teleconference and co-located at a given originating physical location, the active speaker selection performed by one or more participants in the teleconference located at a remote physical location, the method comprising the steps of:
- receiving, from said given originating physical location, a plurality of speaker identities, each speaker identity corresponding to one of said plurality of active speakers located at said given originating physical location;
  
  selecting one of said received speaker identities;
  
  receiving, from said given originating physical location, one or more estimated speech signals, each received estimated speech signal corresponding to one of said plurality of active speakers located at said given originating physical location, said one or more received estimated speech signals including the estimated speech signal corresponding to the active speaker which corresponds to the selected one of said received speaker identities; and
  
  outputting through a loudspeaker, at the remote physical location, the estimated speech signal corresponding to the active speaker which corresponds to the selected one of said received speaker identities.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The method of claim 9 further comprising the step of transmitting the selected one of said received speaker identities back to the given originating physical location, and wherein the step of receiving one or more estimated speech signals comprises receiving only the estimated speech signal corresponding to the active speaker which corresponds to the selected one of said received speaker identities.
  - 11. The method of claim 9 wherein the step of receiving one or more estimated speech signals comprises receiving all of said estimated speech signals corresponding to active speakers which correspond to all of said received speaker identities, and wherein the step of selecting one of said received speaker identities further comprises selecting the estimated speech signal corresponding to the active speaker which corresponds to the selected one of said received speaker identities for said outputting through said loudspeaker.
  - 12. The method of claim 9 wherein the received speaker identities comprise one or more of names, titles, photos, positions and affiliations.
  - 13. The method of claim 9 further comprising the step of displaying said received speaker identities on a visual display at said remote physical location.
  - 14. The method of claim 13 wherein the visual display is associated with a personal computer and wherein the step of selecting one of said received speaker identities is performed with use of a computer input device also associated with said personal computer.

15. An apparatus for enabling active speaker selection during a teleconference, the active speaker to be selected from a plurality of active speakers participating in said teleconference and co-located at a given originating physical location, the selection of an active speaker to be made by one or more participants in the teleconference located at a remote physical location, the apparatus comprising:
- a plurality of microphones;
  
  a microphone array processor operable to generate, based on signals from said plurality of microphones, a plurality of estimated speech signals, each estimated speech signal comprising speech representative of a single one of said plurality of active speakers co-located at the given originating physical location;
  
  a speaker recognition system operable to perform speaker recognition on each of said estimated speech signals to generate a corresponding speaker identity associated with the active speaker represented thereby; and
  
  a transmitter operable to transmit a plurality of said speaker identities, each speaker identity corresponding to one of said estimated speech signals, to said remote physical location, the transmitter further operable to transmit one or more of said estimated speech signals to said remote physical location.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The apparatus of claim 15 further comprising a receiver operable to receive from said remote physical location, in response to said transmitting of said plurality of speaker identities thereto, a selection of one of said speaker identities, and wherein the transmitter is operable to transmit to said remote physical location the estimated speech signal corresponding to said selected speaker identity.
  - 17. The apparatus of claim 15 wherein the transmitter is operable to transmit all of the estimated speech signals corresponding to all of said plurality of speaker identities to said remote physical location.
  - 18. The apparatus of claim 15 wherein the microphone array processor is operable to generate said plurality of estimated speech signals with use of a multiple beam-forming microphone array processing technique.
  - 19. The apparatus of claim 15 wherein the microphone array processor is operable to generate said plurality of estimated speech signals with use of a beam scanning microphone array processing technique.
  - 20. The apparatus of claim 15 wherein the speaker recognition system is operable to perform speaker recognition on each of said estimated speech signals with use of a pre-populated speaker database which comprises voice feature information and speaker identity information associated with each of a plurality of possible speakers, and wherein the speaker recognition system is further operable to generate the speaker identities associated with the active speakers by comparing voice feature information extracted from said estimated speech signals to said voice feature information comprised in said speaker database and associated with a plurality of said possible speakers included therein.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Alcatel-Lucent SA (Nokia Corporation)
Inventors
Liu, Dong, Ahuja, Sudhir Raman, Zhou, Qiru, Chen, Jingdong, Huang, Yiteng Arden

Granted Patent

US 8,503,653 B2
Time in Patent Office

Days
Field of Search
US Class Current

379/202.10
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 2021/02166   Microphone arrays; Beamforming

H04M 2201/41   using speaker recognition s...

H04M 2203/5072   Multiple active speakers co...

H04M 3/569   using the instant speaker's...

Method and apparatus for active speaker selection using microphone arrays and speaker recognition

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

59 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for active speaker selection using microphone arrays and speaker recognition

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links