Identification of people using multiple types of input

US 8,024,189 B2
Filed: 06/22/2006
Issued: 09/20/2011
Est. Priority Date: 06/22/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying a pool of features comprising at least one feature from a first type of input and at least one feature from a second type of input where the second type of input is different from the first type of input; and

generating a classifier for speaker detection using a learning algorithm wherein nodes of the classifier are selected using the pool of features and a preferable feature is weighted higher than a less preferable feature such that the preferable feature is located in the classifier before the less preferable feature.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

Citations

18 Claims

1. A method comprising:
- identifying a pool of features comprising at least one feature from a first type of input and at least one feature from a second type of input where the second type of input is different from the first type of input; and
  
  generating a classifier for speaker detection using a learning algorithm wherein nodes of the classifier are selected using the pool of features and a preferable feature is weighted higher than a less preferable feature such that the preferable feature is located in the classifier before the less preferable feature.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 further comprising:
    - evaluating the classifier to detect a person.
  - 3. The method of claim 1 further comprising:
    - sorting the nodes of the classifier after the generating step such that a preferable feature is located in the classifier before a less preferable feature.
  - 4. The method of claim 3 wherein the preferable feature requires less computation than the less preferable feature.
  - 5. The method of claim 3 wherein the preferable feature is more highly correlated with speaker detection than the less preferable feature.
  - 6. The method of claim 1 wherein the first type of input or the second type of input includes an audio input and the pool of features includes an audio feature associated with a sound source localization input.
  - 7. The method of claim 1 wherein the first type of input or the second type of input includes a video input and the pool of features includes a video feature defined by a rectangle.
  - 8. The method of claim 1 wherein the learning algorithm comprises the AdaBoost algorithm.

9. A method comprising:
- accepting input data comprising a first type of input data and a second type of input data that is different from the first type of input data; and
  
  evaluating a person detection classifier to detect a person wherein the classifier has been created by;
  
  identifying a pool of features comprising at least one feature associated with the first type of input data and at least one feature associated with the second type of input data; and
  
  generating the classifier using a learning algorithm by selecting nodes of the classifier using the pool of features and weighting a preferable feature higher than a less preferable feature such that the preferable feature is located in the classifier before the less preferable feature.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9 wherein the person is a speaker.
  - 11. The method of claim 9 wherein the classifier is further created by sorting the nodes of the classifier after the generating step such that a preferable feature is located in the classifier before a less preferable feature.
  - 12. The method of claim 11 wherein the preferable feature requires less computation than the less preferable feature.
  - 13. The method of claim 11 wherein the preferable feature is more highly correlated with person detection than the less preferable feature.

14. A system comprising:
- a video input device that produces video data;
  
  an audio input device that produces audio data; and
  
  a detector device including a detector configured to accept the video data and the audio data and evaluate a person detection classifier to detect a person where the classifier has been created by;
  
  identifying a pool of features comprising at least one feature associated with the video data and at least one feature associated with the audio data; and
  
  generating the classifier using a learning algorithm by selecting nodes of the classifier using the pool of features and weighting a preferable feature higher than a less preferable feature such that the preferable feature is located in the classifier before the less preferable feature.
- View Dependent Claims (15)
- - 15. The system of claim 14 further comprising:
    - an auxiliary device that provides storage for at least a portion of the video data or at least a portion of the audio data.

16. A method comprising:
- identifying a pool of features comprising at least one feature from a first type of input and at least one feature from a second type of input where the second type of input is different from the first type of input;
  
  generating a classifier for speaker detection using a learning algorithm wherein nodes of the classifier are selected using the pool of features; and
  
  evaluating the classifier to detect a person, wherein at least one of the at least one feature from the first type of input or the at least one feature from the second type of input operates so that a false positive result is associated with a second person that is different from the person.

17. A method comprising:
- identifying a pool of features comprising at least one feature from a first type of input and at least one feature from a second type of input where the second type of input is different from the first type of input, wherein the first type of input or the second type of input includes an audio input, the pool of features includes an audio feature associated with a sound source localization input, and the audio feature is associated with a function selected from the following functions;

18. A system comprising:
- a video input device that produces video data;
  
  an audio input device that produces audio data, the audio data including sound source localization data; and
  
  a detector device including a detector configured to accept the video data and the audio data and evaluate a person detection classifier to detect a person where the classifier has been created by;
  
  identifying a pool of features comprising at least one feature associated with the video data and at least one feature associated with the audio data, the pool of features including an audio feature associated with a function selected from the following functions;

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Rui, Yong, Cutler, Ross G., Viola, Paul A., Yin, Pei, Sun, Xinding, Zhang, Cha
Primary Examiner(s)
Abebe; Daniel D

Application Number

US11/425,967
Publication Number

US 20070297682A1
Time in Patent Office

1,916 Days
Field of Search

704/246, 704/250, 382/116, 382/118
US Class Current

704/246
CPC Class Codes

G06F 18/214   Generating training pattern...

G06V 10/446   using Haar-like filters, e....

G06V 10/774   Generating sets of training...

G10L 2021/02166   Microphone arrays; Beamforming

G10L 25/78   Detection of presence or ab...

H04N 21/42203   sound input device, e.g. mi...

H04N 21/4223   Cameras H04N23/00 takes pre...

H04N 21/4394   involving operations for an...

H04N 21/44008   involving operations for an...

H04N 21/44213   Monitoring of end-user rela...

H04N 21/4788   communicating with other us...

H04N 7/147   Communication arrangements,...

H04N 7/15   Conference systems

Identification of people using multiple types of input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Identification of people using multiple types of input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links