Speaker segmentation and recognition based on list of speakers

US 9,058,806 B2
Filed: 09/10/2012
Issued: 06/16/2015
Est. Priority Date: 09/10/2012
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

estimating, by a processor, an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes an audio recording of a plurality of speakers;

segmenting, by the processor, the video file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker;

recognizing, by the processor, particular speakers in the video file based on the approximate list of potential speakers;

generating, by the processor, social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and

revising, by the processor, the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.

61 Citations

View as Search Results

20 Claims

1. A method, comprising:
- estimating, by a processor, an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes an audio recording of a plurality of speakers;
  
  segmenting, by the processor, the video file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker;
  
  recognizing, by the processor, particular speakers in the video file based on the approximate list of potential speakers;
  
  generating, by the processor, social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and
  
  revising, by the processor, the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein each one of the one or more applications comprise calendaring software having a respective list of attendees at a meeting, and wherein a recording of the meeting is included in the file.
  - 3. The method of claim 2, wherein a first list of attendees estimated from a portion of the applications is extracted and merged with a second list of attendees estimated from other portions of the applications, and wherein the approximate list of potential speakers is compiled from the merged list of attendees.
  - 4. The method of claim 1, further comprising:
    - segmenting the file according to the revised list; and
      
      recognizing the particular speakers in the file based on the revised list.
  - 5. The method of claim 4, wherein the social graph corresponding to at least one speaker is created from information associated with the at least one speaker stored in a speaker database.
  - 6. The method of claim 5, wherein the information includes one or more of meetings attended by the at least one speaker, other speakers at the meetings, attendees at the meeting, topic of the meeting, subject matter of the meeting, venue of the meeting, emails sent and received by the at least one speaker, recipients of the emails, and subject matters of the emails.
  - 7. The method of claim 1, wherein the revising of the approximate list comprises:
    - identifying each speaker in the approximate list;
      
      reviewing the social graph of each identified speaker;
      
      identifying a potential speaker from members in the social graph; and
      
      adding the potential speaker to the revised list of potential speakers.
  - 8. The method of claim 1, further comprising:
    - aggregating the social graphs of the speakers.
  - 9. The method of claim 1, wherein the speakers include speakers in the approximate list and members of at least a portion of the plurality of social graphs.
  - 10. The method of claim 1, wherein revising the approximate list comprises:
    - creating a first list of speakers from the approximate list of potential speakers;
      
      running the SSR algorithm using the first list;
      
      deriving a second list from the SSR;
      
      generating a third list comprising selecting one or more speakers from the second list based on relative contribution of the speakers and a system confidence;
      
      generating a fourth list comprising augmenting the first list based on the third list and an updated social graph;
      
      running the SSR algorithm using the fourth list; and
      
      iteratively executing the deriving the second list, the generating the third list, the generating the fourth list and the running the SSR algorithm using fourth list until the fourth list is unchanged from a previous iteration.

11. A non-transitory computer readable medium storing program instructions for execution by a processor to perform operations comprising:
- estimating an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes an audio recording of a plurality of speakers;
  
  segmenting the video file according to the approximate list of potential speakers, such that each segment corresponds to at least one speaker;
  
  recognizing particular speakers in the video file based on the approximate list of potential speakers;
  
  generating social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and
  
  revising the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The medium of claim 11, the operations further comprising:
    - segmenting the video file according to the revised list; and
      
      recognizing the particular speakers in the video file based on the revised list.
  - 13. The medium of claim 12, wherein the social graph corresponding to at least one speaker is created from information about the at least one speaker stored in a speaker database.
  - 14. The medium of claim 12, wherein the revising of the approximate list comprises:
    - identifying the at least one speaker in the approximate list;
      
      reviewing the social graph of the identified speaker;
      
      identifying a potential speaker from members in the social graph; and
      
      adding the potential speaker to the revised list of potential speakers.
  - 15. The medium of claim 11, wherein revising the approximate list comprises:
    - creating a first list of speakers from the approximate list of potential speakers;
      
      running a speaker segmentation and recognition (SSR) algorithm using the first list;
      
      deriving a second list from the SSR;
      
      generating a third list comprising selecting one or more speakers from the second list based on relative contribution of the speakers and a system confidence;
      
      generating a fourth list comprising augmenting the first list based on the third list and an updated social graph;
      
      running the SSR algorithm using the fourth list; and
      
      iteratively executing the deriving the second list, the generating the third list, the generating the fourth list and the running the SSR algorithm using fourth list until the fourth list is unchanged from a previous iteration.

16. An apparatus, comprising:
- a memory element for storing data; and
  
  a processor that executes instructions associated with the data, wherein the processor and the memory element cooperate such that the apparatus is configured to;
  
  estimate an approximate list of potential speakers in a video file from one or more applications, wherein the video file includes a recording of a plurality of speakers;
  
  segment the video file according to the approximate list of potential speakers, such that each segment corresponds to at least one speaker;
  
  recognize particular speakers in the video file based on the approximate list of potential speakers;
  
  generating social graphs of the potential speakers from speaker information comprising previous meetings attended by the potential speakers, and information about other speakers and attendees at the previous meetings; and
  
  revising the approximate list of potential speakers by iteratively running a speaker segmentation and recognition (SSR) algorithm using information from the social graphs of the potential speakers.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The apparatus of claim 16, wherein the apparatus is further configured to:
    - segment the video file according to the revised list; and
      
      recognize the particular speakers in the video file based on the revised list.
  - 18. The apparatus of claim 17, wherein the social graph corresponding to at least one speaker is created from information about the at least one speaker stored in a speaker database.
  - 19. The apparatus of claim 17, wherein the revising of the approximate list comprises:
    - identifying each speaker in the approximate list;
      
      reviewing the social graph of each identified speaker;
      
      identifying a potential speaker from members in the social graph; and
      
      adding the potential speaker to the revised list of potential speakers.
  - 20. The apparatus of claim 16, wherein the speakers include speakers in the approximate list and members of at least a portion of the social graphs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Sankar, Ananth, Kajarekar, Sachin, Gannu, Satish K.
Primary Examiner(s)
Lerner, Martin

Application Number

US13/608,420
Publication Number

US 20140074471A1
Time in Patent Office

1,009 Days
Field of Search

704/246, 704/247, 704/249, 379/88.02, 379/158
US Class Current

1/1
CPC Class Codes

G10L 17/02 Preprocessing operations, e...

Speaker segmentation and recognition based on list of speakers

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

61 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Speaker segmentation and recognition based on list of speakers

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others