Speaker recognition in the call center
First Claim
Patent Images
1. A method for distinguishing and identifying at least one of multiple speakers in a speech signal, the method comprising:
- obtaining, by a computer, the speech signal, the speech signal including utterances respectively from at least a first speaker and a second speaker;
extracting, by the computer, speech portions from the speech signal;
performing, by the computer, speaker diarization on the speech portions, the speaker diarization identifying speech portions respectively associated with at least one of the first speaker and the second speaker;
detecting, by the computer, at least one trigger phrase in the respective speech portions by using automatic speech recognition, each possible trigger phrase being associated with a respective prior probability that a current, next or previous utterance in the respective speech portion is an entity name;
detecting, by the computer, at least one entity name in the respective speech portions by using automatic speech recognition; and
identifying, by the computer, at least one of the first speaker and the second speaker by executing a probabilistic model trained to associate at least one of the first speaker and the second speaker with a respective one of the at least one detected entity names, the probabilistic model having been trained based on at least the diarized speech portions, the detected at least one trigger phrase, and the detected at least one entity name.
3 Assignments
0 Petitions
Accused Products
Abstract
Utterances of at least two speakers in a speech signal may be distinguished and the associated speaker identified by use of diarization together with automatic speech recognition of identifying words and phrases commonly in the speech signal. The diarization process clusters turns of the conversation while recognized special form phrases and entity names identify the speakers. A trained probabilistic model deduces which entity name(s) correspond to the clusters.
-
Citations
9 Claims
-
1. A method for distinguishing and identifying at least one of multiple speakers in a speech signal, the method comprising:
-
obtaining, by a computer, the speech signal, the speech signal including utterances respectively from at least a first speaker and a second speaker; extracting, by the computer, speech portions from the speech signal; performing, by the computer, speaker diarization on the speech portions, the speaker diarization identifying speech portions respectively associated with at least one of the first speaker and the second speaker; detecting, by the computer, at least one trigger phrase in the respective speech portions by using automatic speech recognition, each possible trigger phrase being associated with a respective prior probability that a current, next or previous utterance in the respective speech portion is an entity name; detecting, by the computer, at least one entity name in the respective speech portions by using automatic speech recognition; and identifying, by the computer, at least one of the first speaker and the second speaker by executing a probabilistic model trained to associate at least one of the first speaker and the second speaker with a respective one of the at least one detected entity names, the probabilistic model having been trained based on at least the diarized speech portions, the detected at least one trigger phrase, and the detected at least one entity name. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus for distinguishing and identifying at least one of multiple speakers in a speech signal, the apparatus comprising:
a computer configured to; receive a speech signal having utterances from at least a first speaker and a second speaker; extract speech portions of the speech signal; segregate the extracted speech portions into speech portions associated with at least one of the first speaker and the second speaker; recognize one or more predetermined trigger phrases in each speech portion, each possible trigger phrase being associated with a respective prior probability that a current, next or previous utterance in the respective speech portion is a named entity and to recognized one or more entity names each respective speech portion; and identify at least one of the first speaker and the second speaker by executing a probabilistic model trained to associate a recognized entity name with speech portions associated with at least one of the first speaker and the second speaker. - View Dependent Claims (7, 8, 9)
Specification