PRIMARY SPEAKER IDENTIFICATION FROM AUDIO AND VIDEO DATA

US 20150088515A1
Filed: 09/25/2013
Published: 03/26/2015
Est. Priority Date: 09/25/2013
Status: Abandoned Application

First Claim

Patent Images

1. A method, comprising:

receiving image data from a visual sensor of an information handling device;

receiving audio data from one or more microphones of the information handling device;

identifying, using one or more processors, human speech in the audio data;

identifying, using the one or more processors, a pattern of visual features in the image data associated with speaking;

matching, using the one or more processors, the human speech in the audio data with the pattern of visual features in the image data associated with speaking;

selecting, using the one or more processors, a primary speaker from among matched human speech;

assigning control to the primary speaker; and

performing one or more actions based on audio input of the primary speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An aspect provides a method, including: receiving image data from a visual sensor of an information handling device; receiving audio data from one or more microphones of the information handling device; identifying, using one or more processors, human speech in the audio data; identifying, using the one or more processors, a pattern of visual features in the image data associated with speaking; matching, using the one or more processors, the human speech in the audio data with the pattern of visual features in the image data associated with speaking; selecting, using the one or more processors, a primary speaker from among matched human speech; assigning control to the primary speaker; and performing one or more actions based on audio input of the primary speaker. Other aspects are described and claimed.

Citations

22 Claims

1. A method, comprising:
- receiving image data from a visual sensor of an information handling device;
  
  receiving audio data from one or more microphones of the information handling device;
  
  identifying, using one or more processors, human speech in the audio data;
  
  identifying, using the one or more processors, a pattern of visual features in the image data associated with speaking;
  
  matching, using the one or more processors, the human speech in the audio data with the pattern of visual features in the image data associated with speaking;
  
  selecting, using the one or more processors, a primary speaker from among matched human speech;
  
  assigning control to the primary speaker; and
  
  performing one or more actions based on audio input of the primary speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the one or more actions based on the primary speaker identified comprise providing a visual indication of the primary speaker identified.
  - 3. The method of claim 1, further comprising:
    - processing the matched human speech in a virtual assistant application;
      
      wherein the one or more actions based on the primary speaker identified comprise performing an action via the virtual assistant.
  - 4. The method of claim 3, wherein the action performed via the virtual assistant comprises execution of a command derived from processing the matched human speech.
  - 5. The method of claim 1, further comprising:
    - activating a virtual assistant of the information handling device responsive to identifying a primary speaker;
      
      wherein the one or more actions based on the primary speaker identified comprises thereafter performing an action via the virtual assistant.
  - 6. The method of claim 1, further comprising:
    - identifying, using the one or more processors, newly matched human speech as a new primary speaker; and
      
      performing one or more actions based on the new primary speaker identified.
  - 7. The method of claim 1, wherein the receiving audio data from one or more microphones of the information handling device comprises receiving audio data from two or more microphones of the information handling device;
    - andwherein the identifying a pattern of visual features in the image data associated with speaking comprises utilizing directional information in the audio data received to identify the pattern of visual features associated with speaking.
  - 8. The method of claim 1, wherein the identifying a pattern of visual features in the image data associated with speaking comprises utilizing pattern recognition to identify the pattern of visual features associated with speaking.
  - 9. The method of claim 8, wherein the pattern of visual features in the image data associated with speaking comprise facial movement patterns.
  - 10. The method of claim 9, wherein the identifying a pattern of visual features in the image data associated with speaking comprises filtering out facial movement patterns not associated with speaking.

11. An information handling device, comprising:
- a visual sensor;
  
  one or more microphones;
  
  one or more processors; and
  
  a memory storing code executable by the one or more processors to;
  
  receive image data from the visual sensor;
  
  receive audio data from the one or more microphones;
  
  identify human speech in the audio data;
  
  identify a pattern of visual features in the image data associated with speaking;
  
  match the human speech in the audio data with the pattern of visual features in the image data associated with speaking;
  
  select a primary speaker from among matched human speech;
  
  assign control to the primary speaker; and
  
  perform one or more actions based on audio input of the primary speaker.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The information handling device of claim 11, wherein the one or more actions based on the primary speaker identified comprise providing a visual indication of the primary speaker identified.
  - 13. The information handling device of claim 11, wherein the code is further executable by the one or more processors to:
    - process the matched human speech in a virtual assistant application;
      
      wherein the one or more actions based on the primary speaker identified comprise performing an action via the virtual assistant.
  - 14. The information handling device of claim 13, wherein the action performed via the virtual assistant comprises execution of a command derived from processing the matched human speech.
  - 15. The information handling device of claim 11, wherein the code is further executable by the one or more processors to:
    - activate a virtual assistant of the information handling device responsive to identifying a primary speaker;
      
      wherein the one or more actions based on the primary speaker identified comprises thereafter performing an action via the virtual assistant.
  - 16. The information handling device of claim 11, wherein the code is further executable by the one or more processors to:
    - identify newly matched human speech as a new primary speaker; and
      
      perform one or more actions based on the new primary speaker identified.
  - 17. The information handling device of claim 11, wherein to receive audio data from one or more microphones of the information handling device comprises receiving audio data from two or more microphones of the information handling device;
    - andwherein to identify a pattern of visual features in the image data associated with speaking comprises utilizing directional information in the audio data received to identify the pattern of visual features associated with speaking.
  - 18. The information handling device of claim 11, wherein to identify a pattern of visual features in the image data associated with speaking comprises utilizing pattern recognition to identify the pattern of visual features associated with speaking.
  - 19. The information handling device of claim 18, wherein the pattern of visual features in the image data associated with speaking comprise facial movement patterns.

20. A program product, comprising:
- a computer readable storage medium storing instructions executable by one or more processors, the instructions comprising;
  
  computer readable program code configured to receive image data from a visual sensor of an information handling device;
  
  computer readable program code configured to receive audio data from one or more microphones of the information handling device;
  
  computer readable program code configured to identify, using one or more processors, human speech in the audio data;
  
  computer readable program code configured to identify, using the one or more processors, a pattern of visual features in the image data associated with speaking;
  
  computer readable program code configured to match, using the one or more processors, the human speech in the audio data with the pattern of visual features in the image data associated with speaking;
  
  computer readable program code configured to select, using the one or more processors, a primary speaker from among matched human speech;
  
  computer readable program code configured to assign control to the primary speaker; and
  
  computer readable program code configured to perform one or more actions based on audio input of the primary speaker.

21. An information handling device, comprising:
- a visual sensor;
  
  two or more microphones;
  
  one or more processors; and
  
  a memory storing code executable by the one or more processors to;
  
  receive image data from the visual sensor;
  
  receive audio data from the two or more microphones;
  
  identify human speech in the audio data;
  
  identify a pattern of visual features in the image data associated with speaking utilizing directional information in the audio data received to identify the pattern of visual features associated with speaking;
  
  match the human speech in the audio data with the pattern of visual features in the video data associated with speaking;
  
  identify matched human speech as a primary speaker; and
  
  perform one or more actions based on the primary speaker identified.
- View Dependent Claims (22)
- - 22. The information handling device of claim 21, wherein the code is further executable by the one or more processors to:
    - identify newly matched human speech as a new primary speaker; and
      
      perform one or more actions based on the new primary speaker identified.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lenovo Singapore Pte Limited (Lenovo Group Ltd.)
Original Assignee
Lenovo Singapore Pte Limited (Lenovo Group Ltd.)
Inventors
Hunt, James Anthony, Kapinos, Robert James, Ramirez Flores, Axel, Waltermann, Rod D., Beaumont, Suzanne Marion

Application Number

US14/036,728
Publication Number

US 20150088515A1
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/25 using position of the lips,...

G10L 17/06 Decision making techniques;...

PRIMARY SPEAKER IDENTIFICATION FROM AUDIO AND VIDEO DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

PRIMARY SPEAKER IDENTIFICATION FROM AUDIO AND VIDEO DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links