Audio user interaction recognition and context refinement
First Claim
1. A system to track social interactions between a plurality of participants, comprising:
- a fixed beamformer configured to;
receive a plurality of second spatially filtered beam outputs from a plurality of steerable beamformers, each steerable beamformer configured to output a respective one of the second spatially filtered beam outputs and associated with a different participant of the plurality of participants; and
generate a plurality of first spatially filtered beam outputs corresponding to a plurality of active speakers of the plurality of participants, the plurality of first spatially filtered beam outputs indicating a number of active speakers of the plurality of active speakers; and
a processor configured to;
determine similarities between the plurality of first spatially filtered beam outputs and the plurality of second spatially filtered beam outputs;
based on the similarities, output a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers;
based on the similarities, determine the social interactions between the plurality of participants; and
identify a participation status associated with each steerable beamformer based on the social interactions.
1 Assignment
0 Petitions
Accused Products
Abstract
A system which tracks a social interaction between a plurality of participants, includes a fixed beamformer that is adapted to output a first spatially filtered output and configured to receive a plurality of second spatially filtered outputs from a plurality of steerable beamformers. Each steerable beamformer outputs a respective one of the second spatially filtered outputs associated with a different one of the participants. The system also includes a processor capable of determining a similarity between the first spatially filtered output and each of the second spatially filtered outputs. The processor determines the social interaction between the participants based on the similarity between the first spatially filtered output and each of the second spatially filtered outputs.
78 Citations
91 Claims
-
1. A system to track social interactions between a plurality of participants, comprising:
-
a fixed beamformer configured to; receive a plurality of second spatially filtered beam outputs from a plurality of steerable beamformers, each steerable beamformer configured to output a respective one of the second spatially filtered beam outputs and associated with a different participant of the plurality of participants; and generate a plurality of first spatially filtered beam outputs corresponding to a plurality of active speakers of the plurality of participants, the plurality of first spatially filtered beam outputs indicating a number of active speakers of the plurality of active speakers; and a processor configured to; determine similarities between the plurality of first spatially filtered beam outputs and the plurality of second spatially filtered beam outputs; based on the similarities, output a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers; based on the similarities, determine the social interactions between the plurality of participants; and identify a participation status associated with each steerable beamformer based on the social interactions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system to determine a similarity between an output of a fixed microphone array and outputs of a plurality of steerable microphone arrays, comprising:
-
a processor configured to; receive first spatially filtered beam outputs from the fixed microphone array and second spatially filtered beam outputs from the steerable microphone arrays, wherein the first spatially filtered beam outputs are associated with a plurality of active speakers of a plurality of participants and the second spatially filtered beam outputs are associated with the plurality of participants, and wherein the first spatially filtered beam outputs indicate a number of active speakers of the plurality of participants; and determine similarities between the first spatially filtered beam outputs and the second spatially filtered beam outputs; and an output device that is configured to output, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers, wherein the output device is further configured to output, based on the similarities, social interactions between the plurality of participants. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method for tracking social interactions between a plurality of participants, comprising:
-
receiving, from a fixed beamformer, a plurality of first spatially filtered beam outputs corresponding to a plurality of active speakers of a plurality of participants, the plurality of first spatially filtered beam outputs indicating a number of active speakers; receiving, from a plurality of steerable beamformers, a plurality of second spatially filtered beam outputs, each steerable beamformer outputting a respective one of the second spatially filtered beam outputs and associated with a different one of the participants; determining similarities between the plurality of first spatially filtered beam outputs and each of the plurality of second spatially filtered beam outputs; determining, utilizing a processor, the social interactions between the participants based on the similarities; identifying a participation status associated with each steerable beamformer based on the social interactions; and outputting, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A method for determining a similarity between an output of a fixed microphone array and outputs of a plurality of steerable microphone arrays, comprising:
-
receiving first spatially filtered beam outputs from the fixed microphone array and receiving second spatially filtered beam outputs from the steerable microphone arrays, wherein the first spatially filtered beam outputs are associated with a plurality of active speakers of a plurality of participants, and the second spatially filtered beam outputs are associated with the plurality of participants, and wherein the first spatially filtered beam outputs indicate a number of active speakers of the plurality of participants; determining similarities between the first spatially filtered beam outputs of the fixed microphone array and the second spatially filtered beam outputs of the steerable microphone arrays; determining, based on the similarities social interactions between the plurality of participants; and outputting, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
48. An apparatus to track social interactions between a plurality of participants, comprising:
-
means for generating a plurality of first spatially filtered beam outputs corresponding to a plurality of active speakers of a plurality of participants, the plurality of first spatially filtered beam outputs indicating a number of active speakers of the plurality of active speakers; means for receiving a plurality of second spatially filtered beam outputs, each of the second spatially filtered beam outputs associated with a different participant of the plurality of participants; means for determining similarities between the plurality of first spatially filtered beam outputs and each of the plurality of second spatially filtered beam outputs; means for outputting, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers; means for determining the social interactions between the plurality of participants based on the similarities; and means for identifying a participation status associated with each steerable beamformer based on the social interactions. - View Dependent Claims (49, 50, 51, 52, 53, 54, 55, 56, 57)
-
-
58. An apparatus to determine a similarity between an output of a fixed microphone array and outputs of a plurality of steerable microphone arrays, comprising:
-
means for receiving first spatially filtered beam outputs from the fixed microphone array and second spatially filtered beam outputs from the steerable microphone arrays, wherein the first spatially filtered beam outputs are associated with a plurality of active speakers of a plurality of participants and the second spatially filtered beam outputs are associated with the plurality of participants, and wherein the first spatially filtered beam outputs indicate a number of active speakers of the plurality of participants; means for performing a comparison between the first spatially filtered beam outputs and the second spatially filtered beam outputs at least one for each of the steerable microphone arrays; means for determining, based on the comparison, similarities between the first spatially filtered beam outputs of the fixed microphone array and the second spatially filtered beam outputs of the steerable microphone array; means for determining, based on the similarities, social interactions between the plurality of participants; and means for outputting, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers. - View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67)
-
-
68. A non-transitory computer-readable medium comprising computer-readable instructions for causing a processor to:
-
receive, from a plurality of steerable beamformers, a plurality of second spatially filtered beam outputs, each steerable beamformer outputting a respective one of the second spatially filtered beam outputs and each of the plurality of steerable beamformers associated with a different participant of a plurality of participants; generate and output a plurality of first spatially filtered beam outputs corresponding to a plurality of active speakers of the plurality of participants, the plurality of first spatially filtered beam outputs indicating a number of active speakers of the plurality of active speakers; determine similarities between the plurality of first spatially filtered beam outputs and each of the plurality of second spatially filtered beam outputs; based on the similarities, output a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers; determine, utilizing the processor, social interactions between the participants based on the similarities; and identify a participation status associated with each steerable beamformer based on the social interactions. - View Dependent Claims (69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81)
-
-
82. A non-transitory computer-readable medium comprising computer-readable instructions for causing a processor to:
-
receive first spatially filtered beam outputs from a fixed microphone array and second spatially filtered beam outputs from a plurality of steerable microphone arrays, wherein the first spatially filtered beam outputs are associated with a plurality of active speakers of a plurality of participants, and the second spatially filtered beam outputs are associated with the plurality of participants, and wherein the first spatially filtered beam outputs indicate a number of active speakers of the plurality of participants; perform a comparison between the first spatially filtered beam outputs and the second spatially filtered beam outputs; determine, based on the comparison, similarities between the first spatially filtered beam outputs of the fixed microphone array and the second spatially filtered beam outputs of the steerable microphone arrays; determine, based on the similarities, social interactions between the plurality of participants; and output, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers. - View Dependent Claims (83, 84, 85, 86, 87, 88, 89, 90, 91)
-
Specification