Method and apparatus for the automatic separating and indexing of multi-speaker conversations

US 20020091517A1
Filed: 11/30/2001
Published: 07/11/2002
Est. Priority Date: 11/30/2000
Status: Active Grant

First Claim

Patent Images

1. A method of processing a continuous audio stream containing human speech related to at least one particular transaction, comprising the steps of:

digitizing the continuous audio stream;

detecting a speaker change in the digitized audio stream;

performing a speaker recognition if a speaker change is detected;

transcribing at least part of the continuous audio stream if a predetermined speaker is recognized.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Hereby it is enabled that only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s).

In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. Thus an already occurred specific transaction can be found on an endless storage media like a magnetic tape. The proposed mechanism thus makes the task of locating an audio log of a specific transaction a much more less effort.

In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists, a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system. In such an environment, the invention allows to switch between different dictionaries when a first user has stopped utterance and a second user is going to start his utterance.

32 Citations

View as Search Results

18 Claims

1. A method of processing a continuous audio stream containing human speech related to at least one particular transaction, comprising the steps of:
- digitizing the continuous audio stream;
  
  detecting a speaker change in the digitized audio stream;
  
  performing a speaker recognition if a speaker change is detected;
  
  transcribing at least part of the continuous audio stream if a predetermined speaker is recognized.
- View Dependent Claims (3, 4, 5, 6, 7, 12, 17, 18)
- - 3. Method according to claim 1 or 2, comprising the further step of protocolling time information for detected speaker changes.
  - 4. Method according to any of the preceding claims, wherein the step of detecting a speaker change and/or the step of performing a speaker recognition is/are preceded by the further step of detecting non-speech boundaries between continuous speech segments.
  - 5. Method according to any of the preceding claims, wherein the step of detecting a speaker change is accomplished by use of at least one characteristic audio feature, in particular features derived from the spectrum of the audio signal.
  - 6. Method according to claim 1 or 2, wherein the step of performing a speaker recognition involves the particular steps of calculating a speaker signature from the audio stream and comparing the calculated speaker signature with at least one known speaker signature.
  - 7. Method according to any of the preceding claims for use in a speech recognition or voice control system comprising at least two speaker-specific speaker models and/or dictionaries, wherein interchanging between the at least two speaker-specific dictionaries dependent on the detected speaker change and the corresponding recognized speaker.
  - 12. Apparatus according to any of claims 8 to 11, further comprising means for continuously monitoring a real-time continuous audio stream and performing the steps of claim 1 or 2.
  - 17. A data processing program for execution in a data processing system comprising software code portions for performing a method according to any of claims 1 to 7 when said program is run on said computer.
  - 18. A computer program product stored on a computer usable medium, comprising computer readable program means for causing a computer to perform a method according to any claims 1 to 7 when said program is run on said computer.

2. A method of processing a continuous audio stream containing human speech related to at least one particular transaction, comprising the steps of:
- digitizing the continuous audio stream;
  
  detecting a speaker change in the digitized audio stream;
  
  performing a speaker recognition if a speaker change is detected;
  
  indexing the audio stream with respect to the detected speaker change if a predetermined speaker is recognized.

8. Apparatus for processing a continuous audio stream containing human speech related to at least one particular transaction, comprising:
- means for predetermining at least one speaker;
  
  means for detecting speaker changes in the audio stream;
  
  means for recognizing the predetermined speaker in the audio stream;
  
  means for initiating transcription of at least part of the audio stream in case of a detected speaker change and a recognized predetermined speaker.
- View Dependent Claims (10, 11, 13, 14, 15)
- - 10. Apparatus according to claim 8 or 9, further comprising means for detecting non-speech boundaries between continuous speech segments.
  - 11. Apparatus according to any of claims 8 to 10, further comprising means for automatically scanning a continuous audio record, in particular a continuous audio stream recorded on a data or a signal carrier, and for detecting speaker changes in the continuous audio record.
  - 13. Apparatus according to any of claims 8 to 12, further comprising log means for protocolling time information for the at least one detected speaker change.
  - 14. Apparatus according to any of claims 8 to 13, comprising means for marking at least the beginning of a detected speech segment related to a predetermined speaker.
  - 15. Apparatus according to any of claims 8 to 14, comprising data base means for storing speech signatures for at least two speakers.

9. Apparatus for processing a continuous audio stream containing human speech related to at least one particular transaction, comprising:
- means for predetermining at least one speaker;
  
  means for detecting speaker changes in the audio stream;
  
  means for recognizing the predetermined speaker in the audio stream;
  
  means for indexing the audio stream dependent on a detected speaker change and a recognized predetermined speaker.

16. Speech recognition or voice control system processing an incoming audio stream and having at least two speaker models and/or speaker-specific dictionaries, comprising means for detecting a speaker change in the incoming audio stream;
- means for gathering speaker-specific information and for comparing the gathered speaker-specific information with corresponding speaker-specific information of at least one predetermined speaker thus recognizing the at least one predetermined speaker;
  
  means for interchanging between the at least two speaker-specific dictionaries dependent on the detected speaker change and the corresponding recognized speaker.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Stenzel, Gerhard, Kriechbaum, Werner, Frank, Joachim

Granted Patent

US 7,496,510 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 17/00 Speaker identification or v...

G10L 21/028 using properties of sound s...

Method and apparatus for the automatic separating and indexing of multi-speaker conversations

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for the automatic separating and indexing of multi-speaker conversations

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links