SYSTEM AND METHOD FOR IMPROVING SPEAKER SEGMENTATION AND RECOGNITION ACCURACY IN A MEDIA PROCESSING ENVIRONMENT

US 20140074471A1
Filed: 09/10/2012
Published: 03/13/2014
Est. Priority Date: 09/10/2012
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

estimating an approximate list of potential speakers in a file from one or more applications, wherein the file includes a recording of a plurality of speakers;

segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and

recognizing particular speakers in the file based on the approximate list of potential speakers.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.

Citations

20 Claims

1. A method, comprising:
- estimating an approximate list of potential speakers in a file from one or more applications, wherein the file includes a recording of a plurality of speakers;
  
  segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and
  
  recognizing particular speakers in the file based on the approximate list of potential speakers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein each one of the one or more applications comprise calendaring software having a respective list of attendees at a meeting, and wherein a recording of the meeting is included in the file.
  - 3. The method of claim 2, wherein a first list of attendees estimated from a portion of the applications is extracted and merged with a second list of attendees estimated from other portions of the applications, and wherein the approximate list of potential speakers is compiled from the merged list of attendees.
  - 4. The method of claim 1, further comprising:
    - revising the approximate list based on a social graph of at least one speaker in the approximate list;
      
      segmenting the file according to the revised list; and
      
      recognizing the particular speakers in the file based on the revised list.
  - 5. The method of claim 4, wherein the social graph is created from information associated with the at least one speaker stored in a speaker database.
  - 6. The method of claim 5, wherein the information includes one or more of meetings attended by the at least one speaker, other speakers at the meetings, attendees at the meeting, topic of the meeting, subject matter of the meeting, venue of the meeting, emails sent and received by the at least one speaker, recipients of the emails, and subject matters of the emails.
  - 7. The method of claim 4, wherein the revising of the approximate list comprises:
    - identifying the at least one speaker in the approximate list;
      
      reviewing the social graph of the identified speaker;
      
      identifying a potential speaker from members in the social graph; and
      
      adding the potential speaker to the revised list of potential speakers.
  - 8. The method of claim 1, further comprising:
    - revising the approximate list based on a plurality of social graphs of a corresponding plurality of speakers.
  - 9. The method of claim 8, wherein the corresponding plurality of speakers includes speakers in the approximate list and members of at least a portion of the plurality of social graphs.
  - 10. The method of claim 1, further comprising:
    - creating a first list of speakers from the approximate list of potential speakers;
      
      running a speaker segmentation and recognition (SSR) algorithm using the first list;
      
      deriving a second list from the SSR;
      
      generating a third list comprising selecting one or more speakers from the second list based on relative contribution of the speakers and a system confidence;
      
      generating a fourth list comprising augmenting the first list based on the third list and an updated social graph;
      
      running the SSR algorithm using the fourth list; and
      
      iteratively executing the deriving the second list, the generating the third list, the generating the fourth list and the running the SSR algorithm using fourth list until the fourth list is unchanged from a previous iteration.

11. Logic encoded in non-transitory media that includes instructions for execution and when executed by a processor, is operable to perform operations comprising:
- estimating an approximate list of potential speakers in a video/audio file from one or more applications, wherein the video/audio file includes a recording of a plurality of speakers;
  
  segmenting the video/audio file according to the approximate list of potential speakers, such that each segment corresponds to at least one speaker; and
  
  recognizing particular speakers in the video/audio file based on the approximate list of potential speakers.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The logic of claim 11, the operations further comprising:
    - revising the approximate list based on a social graph of at least one speaker in the approximate list;
      
      segmenting the video/audio file according to the revised list; and
      
      recognizing the particular speakers in the video/audio file based on the revised list.
  - 13. The logic of claim 12, wherein the social graph is created from information about the at least one speaker stored in a speaker database.
  - 14. The logic of claim 12, wherein the revising of the approximate list comprises:
    - identifying the at least one speaker in the approximate list;
      
      reviewing the social graph of the identified speaker;
      
      identifying a potential speaker from members in the social graph; and
      
      adding the potential speaker to the revised list of potential speakers.
  - 15. The logic of claim 11, the operations further comprising:
    - creating a first list of speakers from the approximate list of potential speakers;
      
      running a speaker segmentation and recognition (SSR) algorithm using the first list;
      
      deriving a second list from the SSR;
      
      generating a third list comprising selecting one or more speakers from the second list based on relative contribution of the speakers and a system confidence;
      
      generating a fourth list comprising augmenting the first list based on the third list and an updated social graph;
      
      running the SSR algorithm using the fourth list; and
      
      iteratively executing the deriving the second list, the generating the third list, the generating the fourth list and the running the SSR algorithm using fourth list until the fourth list is unchanged from a previous iteration.

16. An apparatus, comprising:
- a memory element for storing data; and
  
  a processor that executes instructions associated with the data, wherein the processor and the memory element cooperate such that the apparatus is configured to;
  
  estimate an approximate list of potential speakers in a video/audio file from one or more applications, wherein the video/audio file includes a recording of a plurality of speakers;
  
  segment the video/audio file according to the approximate list of potential speakers, such that each segment corresponds to at least one speaker; and
  
  recognize particular speakers in the video/audio file based on the approximate list of potential speakers.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The apparatus of claim 16, wherein the apparatus is further configured to:
    - revise the approximate list based on a social graph of at least one speaker in the approximate list;
      
      segment the video/audio file according to the revised list; and
      
      recognize the particular speakers in the video/audio file based on the revised list.
  - 18. The apparatus of claim 17, wherein the social graph is created from information about the at least one speaker stored in a speaker database.
  - 19. The apparatus of claim 17, wherein the revising of the approximate list comprises:
    - identifying the at least one speaker in the approximate list;
      
      reviewing the social graph of the identified speaker;
      
      identifying a potential speaker from members in the social graph; and
      
      adding the potential speaker to the revised list of potential speakers.
  - 20. The apparatus of claim 16, wherein the apparatus is further configured to:
    - revise the approximate list based on a plurality of social graphs of a corresponding plurality of speakers, wherein the corresponding plurality of speakers includes speakers in the approximate list and members of at least a portion of the plurality of social graphs.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Sankar, Ananth, Kajarekar, Sachin, Gannu, Satish K.

Granted Patent

US 9,058,806 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/246
CPC Class Codes

G10L 17/02 Preprocessing operations, e...

SYSTEM AND METHOD FOR IMPROVING SPEAKER SEGMENTATION AND RECOGNITION ACCURACY IN A MEDIA PROCESSING ENVIRONMENT

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR IMPROVING SPEAKER SEGMENTATION AND RECOGNITION ACCURACY IN A MEDIA PROCESSING ENVIRONMENT

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links