Listen to people you recognize

US 9,282,399 B2
Filed: 02/26/2014
Issued: 03/08/2016
Est. Priority Date: 02/26/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

processing, at a first mobile computing device, a video image and an audio communication associated with the video image, wherein the audio communication comprises at least two raw electronic audio signals created from at least two separate microphones, and wherein a relative position of the at least two separate microphones is known;

identifying at least one source of the audio communication from the processing of the video image as part of a visual identification of at least one source of the audio communication;

determining, based on the identifying of the at least one source of the audio communication, an angle from the first mobile computing device to the at least one source of the audio communication; and

contemporaneously displaying, on a display output of the first mobile computing device, (1) first location information associated with the visual identification of the at least one source of the audio communication overlaid on the video image and (2) second location information comprising the angle from the first mobile computing device to the at least one source of the audio communication.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, devices, and methods are described for recognizing and focusing on at least one source of an audio communication as part of a communication including a video image and an audio communication derived from two or more microphones when a relative position between the microphones is known. In certain embodiments, linked audio and video focus areas providing location information for one or more sound sources may each be associated with different user inputs, and an input to adjust a focus in either the audio or video domain may automatically adjust the focus in the another domain.

14 Citations

View as Search Results

30 Claims

1. A method comprising:
- processing, at a first mobile computing device, a video image and an audio communication associated with the video image, wherein the audio communication comprises at least two raw electronic audio signals created from at least two separate microphones, and wherein a relative position of the at least two separate microphones is known;
  
  identifying at least one source of the audio communication from the processing of the video image as part of a visual identification of at least one source of the audio communication;
  
  determining, based on the identifying of the at least one source of the audio communication, an angle from the first mobile computing device to the at least one source of the audio communication; and
  
  contemporaneously displaying, on a display output of the first mobile computing device, (1) first location information associated with the visual identification of the at least one source of the audio communication overlaid on the video image and (2) second location information comprising the angle from the first mobile computing device to the at least one source of the audio communication.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1 wherein the first location information comprises information identifying lips of a person that is the at least one source of the audio communication.
  - 3. The method of claim 1 wherein the video image and the audio communication associated with the video image are received at the first mobile computing device via a network from at least one far-side mobile device that is different from the first mobile computing device,wherein the at least one far-side mobile device comprises at least one of the two separate microphones.
  - 4. The method of claim 1 wherein the first location information comprises information identifying a mouth of at least one person that is identified as the at least one source of the audio communication;
    - andwherein the angle from the first mobile computing device is determined from a point associated with the at least two separate microphones.
  - 5. The method of claim 4 further comprising:
    - identifying a second mouth of a second speaker in the video image;
      
      determining a second angle associated with a second direction from the point associated with the at least two separate microphones to a second source of the audio communication; and
      
      processing the at least two raw electronic audio signals from the at least two separate microphones to simultaneously filter sounds received from outside the angle and the second angle and/or to emphasize sounds received from the angle and the second angle.
  - 6. The method of claim 4 wherein the angle is defined from the point associated with the at least two separate microphones to corners of the mouth of the at least one person that is identified as the at least one source of the audio communication.
  - 7. The method of claim 6 wherein the first location information comprises a shape drawn around the mouth of the at least one person that is identified as the at least one source of the audio communication.
  - 8. The method of claim 7 further comprising processing the at least two raw electronic audio signals to (a) filter sounds received from outside the angle and/or (b) to emphasize the sounds received from the angle.
  - 9. The method of claim 8 further comprising:
    - tracking a relative movement of the mouth in the video image over time; and
      
      adjusting the angle to match the relative movement of the mouth in the video image.
  - 10. The method of claim 9 further comprising:
    - ending the processing of the at least two raw electronic audio signals to filter the sounds received from outside the angle and/or to emphasize the sounds received from the angle when the mouth of the at least one person that is identified as the at least one source of the audio communication moves outside the video image.
  - 11. The method of claim 9 wherein the first location information and the second location information each comprise part of a user interface.
  - 12. The method of claim 11 further comprising:
    - receiving a first user input adjusting the first location information using a first portion of the user interface associated with the first location information; and
      
      automatically adjusting the second location information and a second portion of the user interface associated with the second location information in response to the adjusting the first portion of the user interface.
  - 13. The method of claim 12 wherein automatically adjusting the second location information comprises:
    - changing the angle; and
      
      updating the display output.
  - 14. The method of claim 12 wherein adjusting the first portion of the user interface associated with the first location information comprises adjusting the shape drawn around the mouth of the at least one person that is identified as the at least one source of the audio communication;
    - andwherein automatically adjusting second location information comprises updating the angle based on the second portion of the user interface associated with the shape drawn around the mouth.
  - 15. The method of claim 12 wherein the first user input adjusting the second portion of the user interface associated with the second location information automatically adjusts the first portion of the user interface.

16. A mobile computing device comprising:
- a processor;
  
  a display output for outputting video image, wherein the display output is coupled to the processor;
  
  at least two separate microphones, wherein the at least two separate microphones are coupled to the processor; and
  
  a memory coupled to the processor, wherein the memory comprises instructions that, when executed by the processor, cause the processor to;
  
  process the video image and an audio communication associated with the video image, wherein the audio communication comprises at least two raw electronic audio signals created from the at least two separate microphones, and wherein a relative position of the at least two separate microphones is known;
  
  identify at least one source of the audio communication from the processing of the video image as part of a visual identification of the at least one source of the audio communication;
  
  determine, based on the identifying of the at least one source of the audio communication, an angle from the mobile computing device to the at least one source of the audio communication; and
  
  contemporaneously display, on the display output (1) first location information associated with the visual identification of the at least one source of the audio communication overlaid on the video image and (2) second location information comprising the angle from the mobile computing device to the at least one source of the audio communication.
- View Dependent Claims (17, 18)
- - 17. The mobile computing device of claim 16 wherein the first location information comprises information identifying a person that is identified as the at least one source of the audio communication;
    - andwherein the angle from the mobile computing device is determined from a point associated with the at least two separate microphones.
  - 18. The mobile computing device of claim 17 wherein identifying the person that is identified as the at least one source of the audio communication comprises:
    - identifying a first person as a first source of the audio communication;
      
      identifying a second person as a second source of the audio communication;
      
      wherein the first person is associated with (1) a first portion of the first location information associated with a visual identification of the first person overlaid on the video image and (2) a first portion of the second location information comprising the angle from the mobile computing device to the first person; and
      
      wherein the second person is associated with (1) a second portion of the first location information associated with a visual identification of the second person overlaid on the video image and (2) a second portion of the second location information comprising a second angle from the mobile computing device to the second person.

19. A mobile computing device comprising:
- means for processing video image and an audio communication associated with the video image, wherein the audio communication comprises at least two raw electronic audio signals created from at least two separate microphones, and wherein a relative position of the at least two separate microphones is known;
  
  means for identifying at least one source of the audio communication from the processing of the video image as part of a visual identification of the at least one source of the audio communication;
  
  means for determining, based on the identifying of the at least one source of the audio communication, an angle from the mobile computing device to the at least one source of the audio communication; and
  
  means for contemporaneously displaying, on a display output of the mobile computing device (1) first location information associated with the visual identification of the at least one source of the audio communication overlaid on the video image and (2) second location information comprising the angle from the first mobile computing device to the at least one source of the audio communication.
- View Dependent Claims (20)
- - 20. The mobile computing device of claim 19 further comprising:
    - means for receiving a first user input adjusting the first location information using a first portion of a user interface associated with the first location information;
      
      means for receiving a second user input adjusting the second location information using a second portion of the user interface; and
      
      means for automatically adjusting the second location information when the first user input is received and for automatically adjusting the first location information when the second user input is received.

21. A method of visual and audio identification of a sound source comprising:
- capturing, by a far-side mobile device, a far-side video image and a far-side audio communication, wherein the far-side audio communication comprises at least two raw electronic audio signals created from at least two separate microphones integrated as part of the far-side mobile device, and wherein a relative position of the at least two separate microphones is known;
  
  communicating the far-side video image and the far-side audio communication from the far-side mobile device to a near-side mobile device via a network;
  
  processing the far-side video image and the far-side audio communication to identify at least one source of the far-side audio communication as part of a visual identification of the at least one source of the far-side audio communication;
  
  determining, based on the identifying of the at least one source of the far-side audio communication, at least one angle from the far-side mobile device to the at least one source of the far-side audio communication;
  
  processing the at least two raw electronic audio signals to (a) filter sounds received from outside the at least one angle from the far-side mobile device to the at least one source of the far-side audio communication and/or (b) to emphasize sounds received from the at least one angle from the far-side mobile device to the at least one source of the far-side audio communication; and
  
  creating an output comprising (1) first far-side location information associated with the visual identification of the at least one source of the far-side audio communication overlaid on the far-side video image and (2) second far-side location information comprising the at least one angle from the far-side mobile device to the at least one source of the far-side audio communication.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The method of claim 21 wherein the determining of the at least one angle from the far-side mobile device to the at least one source of the far-side audio communication is performed by the far-side mobile device, and wherein the at least one angle from the far-side mobile device to the at least one source of the far-side audio communication is communicated from the far-side mobile device to the near-side mobile device with the far-side video image and the far-side audio communication.
  - 23. The method of claim 21 wherein (1) processing the far-side video image and the far-side audio communication to identify at least one source of the far-side audio communication as part of a visual identification of the at least one source of the far-side audio communication is performed by the near-side mobile device after the near-side mobile device receives the far-side video image and the far-side audio communication.
  - 24. The method of claim 23 wherein the near-side mobile device receives the relative position of the at least two separate microphones along with reception of the far-side audio communication.
  - 25. The method of claim 24 wherein the first far-side location information and the second far-side location information each comprise part of a user interface presented on a display output of the near-side mobile device.
  - 26. The method of claim 25 further comprising:
    - receiving a first near-side user input adjusting the first far-side location information using a first portion of the user interface associated with the first far-side location information.
  - 27. The method of claim 26 further comprising:
    - automatically adjusting the second far-side location information and a second portion of the user interface associated with the second far-side location information in response to the adjusting the first portion of the user interface;
      
      determining an updated at least one angle from the far-side mobile device to the at least one source of the far-side audio communication; and
      
      automatically adjusting processing the at least two raw electronic audio signals based on the updated at least one angle from the far-side mobile device to the at least one source of the far-side audio communication.
  - 28. The method of claim 21 further comprising:
    - capturing, by the near-side mobile device, a near-side video image and a near-side audio communication, wherein the near-side audio communication comprises an additional at least two raw electronic audio signals created from an additional at least two separate microphones integrated as part of the near-side mobile device, and wherein a second relative position of the additional at least two separate microphones is known;
      
      processing the near-side video image and the near-side audio communication to identify at least one source of the near-side audio communication as part of a visual identification of the at least one source of the near-side audio communication;
      
      determining, based on the identifying of the at least one source of the near-side audio communication, the at least one angle from the near-side mobile device to the at least one source of the near-side audio communication; and
      
      creating a second output for the near-side mobile device comprising (1) first near-side location information associated with the visual identification of the at least one source of the near-side audio communication overlaid on the near-side video image and (2) second near-side location information comprising the at least one angle from the near-side mobile device to the at least one source of the near-side audio communication.
  - 29. The method of claim 28 further comprising:
    - displaying the first near-side location information, the second near-side location information, the first far-side location information, and the second far-side location information in a display output of the near-side mobile device as part of a user interface of the near-side mobile device,wherein the at least one source of the far-side audio communication comprises a user of the far-side mobile device and wherein the at least one source of the near-side audio communication comprises a user of the near-side mobile device.
  - 30. The method of claim 21 further comprising:
    - processing the at least two raw electronic audio signals prior to communicating the far-side audio communication from the far-side mobile device to the near-side mobile device;
      
      receiving, at the far-side mobile device, a first far-side user input adjusting the first far-side location information using a first portion of a user interface associated with the first far-side location information; and
      
      adjusting the processing of the at least two raw electronic audio signals based on the first far-side user input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Kim, Lae-Hoon, Ton, Phuong Lam, Visser, Erik, Toman, Jeremy P., MacDougall, Francis Bernard
Primary Examiner(s)
Huber, Paul

Application Number

US14/191,321
Publication Number

US 20150245133A1
Time in Patent Office

741 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06V 40/16   Human faces, e.g. facial pa...

G06V 40/176   Dynamic expression

G10L 15/25   using position of the lips,...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 21/02   Speech enhancement, e.g. no...

G10L 21/0208   Noise filtering

H04N 7/147   Communication arrangements,...

H04R 1/326   for microphones H04R1/34 an...

H04R 1/406   microphones

H04R 3/005   for combining the signals o...

Listen to people you recognize

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

14 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Listen to people you recognize

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links