Method and apparatus for the automatic separating and indexing of multi-speaker conversations
First Claim
1. A method of processing a continuous audio stream containing human speech from a plurality of speakers related to at least one particular transaction, comprising the steps of:
- digitizing the continuous audio stream;
detecting a speaker change in the digitized audio stream;
performing a speaker recognition if a speaker change is detected;
determining whether a recognized speaker is a predetermined speaker; and
transcribing at least part of the continuous audio stream only if the recognized speaker is the predetermined speaker;
wherein said transcribing is processed using a dictionary of speaker-trained data trained by the speaker being transcribed.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are a method and apparatus for processing a continuous audio stream containing human speech in order to locate a particular speech-based transaction in the audio stream, applying both known speaker recognition and speech recognition techniques. Only the utterances of a particular predetermined speaker are transcribed thus providing an index and a summary of the underlying dialogue(s). In a first scenario, an incoming audio stream, e.g. a speech call from outside, is scanned in order to detect audio segments of the predetermined speaker. These audio segments are then indexed and only the indexed segments are transcribed into spoken or written language. In a second scenario, two or more speakers located in one room are using a multi-user speech recognition system (SRS). For each user there exists a different speaker model and optionally a different dictionary or vocabulary of words already known or trained by the speech or voice recognition system.
-
Citations
30 Claims
-
1. A method of processing a continuous audio stream containing human speech from a plurality of speakers related to at least one particular transaction, comprising the steps of:
-
digitizing the continuous audio stream; detecting a speaker change in the digitized audio stream; performing a speaker recognition if a speaker change is detected; determining whether a recognized speaker is a predetermined speaker; and transcribing at least part of the continuous audio stream only if the recognized speaker is the predetermined speaker; wherein said transcribing is processed using a dictionary of speaker-trained data trained by the speaker being transcribed. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of processing a continuous audio stream containing human speech of a plurality of speakers related to at least one particular transaction, comprising the steps of:
-
digitizing the continuous audio stream; detecting a speaker change in the digitized audio stream; performing a speaker recognition if a speaker change is detected; determining whether a recognized speaker is a predetermined speaker; indexing the audio stream with respect to the detected speaker change only if the recognized speaker is the predetermined speaker; wherein said indexing is processed using a dictionary of speaker-trained data trained by the speaker being transcribed. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus for processing a continuous audio stream containing human speech from a plurality of speakers related to at least one particular transaction, comprising:
-
a digitizer which digitizes the continuous audio stream; a detector which detects speaker changes in the digitized audio stream; a recognizer which recognizes the predetermined speaker in the audio stream; a determiner which determines whether a recognized speaker is a predetermined speaker; and an initiator which initiates transcription of at least part of the continuous audio stream only if the recognized speaker is the predetermined known speaker; wherein said transcription is processed using a dictionary of speaker-trained data trained by the speaker being transcribed. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. An apparatus for processing a continuous audio stream containing human speech from a plurality of speakers related to at least one particular transaction, comprising:
-
a detector which detects speaker changes in the audio stream; a digitizer which digitizes the continuous audio stream; a recognizer which recognizes the predetermined speaker in the digitized audio stream; a determiner which determines whether a recognized speaker is a predetermined speaker; and an indexer for indexing at least part of the continuous audio stream only if the recognized speaker is the predetermined known speaker; wherein said indexing is processed using a dictionary of speaker-trained data trained by the speaker being transcribed. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing a continuous audio stream containing human speech from a plurality of speakers related to at least one particular transaction, said method comprising the steps of:
-
digitizing the continuous audio stream; detecting a speaker change in the digitized audio stream; performing a speaker recognition if a speaker change is detected; determining whether a recognized speaker is a predetermined speaker; and transcribing at least part of the continuous audio stream only if the recognized speaker is the predetermined speaker; wherein said transcribing is processed using a dictionary of speaker-trained data trained by the speaker being transcribed.
-
-
30. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing a continuous audio stream containing human speech from a plurality of speakers related to at least one particular transaction, said method comprising the steps of:
-
digitizing the continuous audio stream; detecting a speaker change in the digitized audio stream; performing a speaker recognition if a speaker change is detected; determining whether a recognized speaker is a predetermined speaker; indexing the audio stream with respect to the detected speaker change only if the recognized speaker is the predetermined speaker; wherein said indexing is processed using a dictionary of speaker-trained data trained by the speaker being transcribed.
-
Specification