Text-independent speaker recognition system and method based on acoustic segment matching
First Claim
1. A speaker recognition system for automatically recognizing a given speaker from a group of enrolled speakers where said system can select a given enrolled speaker from said other enrolled speakers, comprising:
- enrollment means including first acoustic analysis means for enabling each speaker to provide an input speech training utterance for converting said input speech utterance into frames of equal duration by providing at an output a parametric representation of each frame,covering analysis means coupled to said output of said acoustic analysis means for dividing said parametric representation to shorter, equal length segments indicative of sub-word units and providing at an output a subset of said segments that represent said training utterance, with said subset of segments representing an initial template set for each enrolled speaker, template storage means for storing said initial template set,aligning means coupled to said template set storing means for aligning each template frame with at least one frame of said input speech utterance to provide a label for each utterance frame as aligned with a template frame,frame averaging means coupled to said aligning means for averaging all input speech utterance frames that were aligned with each template frame,template update means coupled to said frame averaging means and said template set storing means to replace each template set as stored with the corresponding average of said utterance frames to provide a new set of stored templates,recognition means including second acoustic analysis means for enabling a speaker to be recognized to speak and for dividing said speech into said equal duration frames by providing at an output a parametric representation of each frame, and means for matching said new set of stored templates for each enrolled speaker with said parametric representation of each frame to provide at an output a match score for each enrolled speaker and means responsive to the minimum match score to identify said one of said enrolled speakers who is speaking.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method and system for speaker enrollment, as well as for speaker recognition. Speaker enrollment creates for each candidate speaker a set of short acoustic segments, or templates, of phonemic duration. An equal number of templates is derived from every candidate speaker'"'"'s training utterance. A speaker'"'"'s template set serves as a model for that speaker. Recognition is accomplished by employing a continuous speech recognition (CSR) system to match the recognition utterance with each speaker'"'"'s template set in turn. The system selects the speaker whose templates match the recognition utterance most closely, that is, the speaker whose CSR match score is lowest. The method of the invention incorporates the entire training utterance in each speaker model, and explains the entire test utterance. The method of the invention models individual short segments of the speech utterances as well as their long-term statistics. Both static and dynamic speaker characteristics are captured in the speaker models.
47 Citations
3 Claims
-
1. A speaker recognition system for automatically recognizing a given speaker from a group of enrolled speakers where said system can select a given enrolled speaker from said other enrolled speakers, comprising:
-
enrollment means including first acoustic analysis means for enabling each speaker to provide an input speech training utterance for converting said input speech utterance into frames of equal duration by providing at an output a parametric representation of each frame, covering analysis means coupled to said output of said acoustic analysis means for dividing said parametric representation to shorter, equal length segments indicative of sub-word units and providing at an output a subset of said segments that represent said training utterance, with said subset of segments representing an initial template set for each enrolled speaker, template storage means for storing said initial template set, aligning means coupled to said template set storing means for aligning each template frame with at least one frame of said input speech utterance to provide a label for each utterance frame as aligned with a template frame, frame averaging means coupled to said aligning means for averaging all input speech utterance frames that were aligned with each template frame, template update means coupled to said frame averaging means and said template set storing means to replace each template set as stored with the corresponding average of said utterance frames to provide a new set of stored templates, recognition means including second acoustic analysis means for enabling a speaker to be recognized to speak and for dividing said speech into said equal duration frames by providing at an output a parametric representation of each frame, and means for matching said new set of stored templates for each enrolled speaker with said parametric representation of each frame to provide at an output a match score for each enrolled speaker and means responsive to the minimum match score to identify said one of said enrolled speakers who is speaking.
-
-
2. A method for recognizing one speaker in a group of enrolled speakers, comprising the steps of:
-
enrolling each speaker by having each speaker speak a training utterance, converting said training utterance into speech frames of equal duration each indicative of a segment of said utterance, dividing said speech frames into shorter equal duration segments each indicatvie of a segment of said utterance, forming a set of templates from said shorter equal duration segments, storing said template set, aligning each of said stored templates with at least one frame of said training utterance for each speaker averaging said aligned frames, forming a new set of templates from said averaged frames, storing said new set of templates for enabling a comparison of the speech of any arbitrary enrolled speaker with said new set of templates as stored to recognize said arbitrary speaker from any other enrolled speaker. - View Dependent Claims (3)
-
Specification