Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering

US 6,424,946 B1
Filed: 11/05/1999
Issued: 07/23/2002
Est. Priority Date: 04/09/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for identifying a speaker in an audio source, said method comprising the steps of:

transcribing said audio source to create a textual version of said audio information;

identifying potential segment boundaries in said audio source; and

assigning a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. The speaker identification system uses an enrolled speaker database that includes background models for unenrolled speakers, such as “unenrolled male” or “unenrolled female,” to assign a speaker label to each identified segment. Speaker labels are identified for each speech segment by comparing the segment utterances to the enrolled speaker database and finding the “closest” speaker, if any. A speech segment having an unknown speaker is initially assigned a general speaker label from the set of background models. The “unenrolled” segment is assigned a segment number and receives a cluster identifier assigned by the clustering system. If a given segment is assigned a temporary speaker label associated with an unenrolled speaker, the user can be prompted by the present invention to identify the speaker. Once the user assigns a speaker label to an audio segment having an unknown speaker, the same speaker name can be automatically assigned to any segments that are assigned to the same cluster and the enrolled speaker database can be automatically updated to enroll the previously unknown speaker.

Citations

25 Claims

1. A method for identifying a speaker in an audio source, said method comprising the steps of:
- transcribing said audio source to create a textual version of said audio information;
  
  identifying potential segment boundaries in said audio source; and
  
  assigning a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said assigning step further comprises the step of assigning a score indicating the confidence of said assigned speaker label.
  - 3. The method of claim 1, further comprising the step of prompting a user for a name of an unenrolled speaker.
  - 4. The method of claim 1, further comprising the step of clustering homogeneous segments into a cluster.
  - 5. The method of claim 4, further comprising the step of assigning an identity to all segments in the same cluster.
  - 6. The method of claim 1, wherein said assigning step utilizes an enrolled speaker database to assign a speaker label to each identified segment.
  - 7. The method of claim 4, further comprising the step of using segments in the same cluster as speaker training files to update an enrolled speaker database with said unenrolled speaker.

8. A method for identifying a speaker in an audio source, said method comprising the steps of:
- computing feature vectors from said audio source; and
  
  applying said feature vectors to parallel processing branches to;
  
  transcribe said audio source to create a textual version of said audio information;
  
  identify potential segment boundaries in said audio source; and
  
  assign a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, wherein said feature vectors are applied to said parallel branches using a shared memory architecture.
  - 10. The method of claim 9, wherein said shared memory architecture distributes the computed feature vectors to a channel corresponding to each of said parallel processing branches.

11. A system for identifying a speaker in an audio source, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  transcribe said audio source to create a textual version of said audio information;
  
  identify potential segment boundaries in said audio source substantially concurrently with said transcribing step; and
  
  assign a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.

12. An article of manufacture, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to transcribe said audio source to create a textual version of said audio information;
  
  a step to identify potential segment boundaries in said audio source substantially concurrently with said transcribing step; and
  
  a step to assign a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.

13. A method for identifying a speaker in an audio source, said method comprising the steps of:
- transcribing said audio source to create a textual version of said audio information;
  
  identifying potential segment boundaries in said audio source;
  
  assigning a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker; and
  
  presenting said textual version together with said assigned speaker labels.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13, further comprising the step of prompting a user for a name of an unenrolled speaker.
  - 15. The method of claim 13, further comprising the step of clustering homogeneous segments into a cluster.
  - 16. The method of claim 15, further comprising the step of assigning an identity to all segments in the same cluster.
  - 17. The method of claim 13, wherein said assigning step utilizes an enrolled speaker database to assign a speaker label to each identified segment.
  - 18. The method of claim 15, further comprising the step of using segments in the same cluster as speaker training files to update an enrolled speaker database with said unenrolled speaker.

19. A method for identifying a speaker in an audio source, said method comprising the steps of:
- assigning a speaker label to each segment in said audio source, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker;
  
  presenting said user with said assigned speaker labels; and
  
  prompting a user for an identity of an unenrolled speaker.

20. A system for identifying a speaker in an audio source, comprising:
- a memory that stores computer-readable code; and
  
  a processor operatively coupled to said memory, said processor configured to implement said computer-readable code, said computer-readable code configured to;
  
  assigning a speaker label to each segment in said audio source, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker;
  
  presenting said user with said assigned speaker labels; and
  
  prompting a user for the identity of an unenrolled speaker.

21. A method for identifying a speaker in an audio source, said method comprising the steps of:
- transcribing said audio source to create a textual version of said audio information;
  
  identifying potential segment boundaries in said audio source;
  
  clustering homogeneous segments into a cluster; and
  
  assigning a speaker label to each identified segment, said speaker label being selected from a speaker database that includes at least one model for an unenrolled speaker.
- View Dependent Claims (22, 23, 24, 25)
- - 22. The method of claim 21, further comprising the step of prompting a user for a name of an unenrolled speaker.
  - 23. The method of claim 22, further comprising the step of assigning said name to all segments in the same cluster.
  - 24. The method of claim 21, wherein said assigning step utilizes an enrolled speaker database to assign a speaker label to each identified segment.
  - 25. The method of claim 21, further comprising the step of using segments in the same cluster as speaker training files to update an enrolled speaker database with said unenrolled speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Viswanathan, Mahesh, Tritschler, Alain Charles Louis
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/434,604
Time in Patent Office

991 Days
Field of Search

704/231, 704/245, 704/256, 704/255, 704/500, 704/240, 704/241, 704/239, 704/270, 704/257, 704/251, 704/235, 704/250, 704/253, 704/272, 704/275, 704/236, 704/238, 704/260, 704/200
US Class Current

704/272
CPC Class Codes

G06F 16/60   of audio data

G06F 16/65   Clustering; Classification

G06F 18/2321   using statistics or functio...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links