Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications

US 5,655,058 A
Filed: 04/12/1994
Issued: 08/05/1997
Est. Priority Date: 04/12/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for segmenting audio data according to speaker, said audio data comprising conversational speech from a plurality of individual speakers, comprising the steps of:

providing an individual HMM for each individual speaker of the plurality of individual speakers of the audio data, each Hidden Markov Model (HMM) having at least one state;

constructing a speaker network HMM by connecting said individual HMMs in parallel;

segmenting said audio data into segments by determining a most likely sequence of states through the speaker network HMM, each segment being associated with a one of said individual HMMs; and

determining an individual speaker of the plurality of individual speakers of each segment of the path.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for segmenting audio data, comprising speech from a plurality of individual speakers, according to speaker is provided. The method comprises providing individual HMMs for each individual speaker, each individual HMM including at least one state, and constructing a speaker network HMM by connecting the individual HMMs in parallel. The audio data is then divided into segments by determining a most likely sequence of states through the speaker network HMM, each of the segments being associated with one of the individual HMMs. Afterward, the speaker of each of the segments is identified. The segmented data may be used to form an index into the audio data according to speaker.

102 Citations

20 Claims

1. A method for segmenting audio data according to speaker, said audio data comprising conversational speech from a plurality of individual speakers, comprising the steps of:
- providing an individual HMM for each individual speaker of the plurality of individual speakers of the audio data, each Hidden Markov Model (HMM) having at least one state;
  
  constructing a speaker network HMM by connecting said individual HMMs in parallel;
  
  segmenting said audio data into segments by determining a most likely sequence of states through the speaker network HMM, each segment being associated with a one of said individual HMMs; and
  
  determining an individual speaker of the plurality of individual speakers of each segment of the path.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the step of constructing a speaker network HMM further comprises determining an exit penalty for exiting a particular individual HMM in said speaker network.
  - 3. The method of claim 1, wherein each of said individual HMMs comprises a plurality of states.
  - 4. The method of claim 3, wherein each of said individual HMMs further includes a silence state.
  - 5. The method of claim 1, wherein said step of constructing a speaker network HMM further comprises providing a silence model HMM, and connecting said silence model in parallel with said individual HMMs.
  - 6. The method of claim 5, wherein said silence model comprises a plurality of states, each with a tied output distribution.
  - 7. The method of claim 6, wherein said silence model comprises a plurality of states, each with a tied Gaussian distribution.
  - 8. The method of claim 7, wherein each of said individual HMMs further include a silence state, the output distribution of said individual HMM silence states are tied to output distributions of said silence states in said silence model.
  - 9. The method of claim 1, wherein said step of constructing a speaker network HMM further comprises providing a garbage model, and connecting said garbage model in parallel with said individual HMMs.
  - 10. The method of claim 9, wherein said garbage model is provided from portions of said audio data containing speech from at least each of said plurality of individual speakers.

11. A method of indexing audio data according to speaker, said audio data comprising conversational speech from a plurality of individual speakers, comprising the steps of:
- providing an individual Hidden Markov Model (HMM) for each individual speaker of the plurality of individual speakers of the audio data, each individual HMM including at least one state;
  
  constructing a speaker network HMM by connecting said individual HMMs in parallel;
  
  segmenting said audio data into segments by finding a most likely sequence of states through the speaker network HMM, each segment being associated with a one of the individual HMMs;
  
  determining for each segment an individual speaker of the plurality of individual speakers according to said individual HMMs;
  
  collecting segments from each individual; and
  
  outputting the results of the collected segments.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, further comprisingproviding new individual HMMs for each individual speaker using said collected segments from each individual;
    - constructing a second speaker network HMM by connecting said new individual HMMs in parallel;
      
      determining for said audio data an optimal path through said second speaker network HMM, identifying segments of said audio data associated with each new individual HMM;
      
      determining the individual speaker of each segment of the path according to said new individual HMMs;
      
      collecting segments from each individual; and
      
      outputting the results of the collected segments.
  - 13. The method of claim 11, wherein the step of providing individual HMMs for each individual speaker comprises training each of said individual HMMs on speech data from a speaker.
  - 14. The method of claim 11, wherein each of said individual HMMs comprise a plurality of states.
  - 15. The method of claim 14, wherein each of said individual HMMs further includes a silence state.
  - 16. The method of claim 11, wherein the step of constructing said speaker network HMM further comprises providing a silence model HMM, and connecting said silence model in parallel with said individual HMMs.
  - 17. The method of claim 16, wherein said silence model comprises a plurality of states, each with a tied output distribution.
  - 18. The method of claim 17, wherein said silence model comprises a plurality of states, each with a tied Gaussian distribution.
  - 19. The method of claim 18, wherein each of said individual HMMs further include a silence state, the output distribution of said individual HMM silence states are tied to output distributions of said silence states in said silence model.
  - 20. The method of claim 11, wherein the step of constructing said speaker network HMM further comprises determining a penalty for exiting any particular individual HMM in said speaker network HMM.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Poon, Alex D., Weber, Karon A., Chou, Philip A., Kimber, Donald G., Chen, Francine R., Balasubramanian, Vijay, Wilcox, Lynn D.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
DORVIL, RICHEMOND

Application Number

US08/226,519
Time in Patent Office

1,211 Days
Field of Search

395/2.65, 395/2.49, 395/2.6, 395/2.61, 395/2.45, 395/2.64, 395/2.42, 395/2.59, 395/2.82, 395/2.66
US Class Current

704/255
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 17/16   Hidden Markov models [HMM]

Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

102 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

102 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links