SPEAKER-CLUSTER DEPENDENT SPEAKER RECOGNITION (SPEAKER-TYPE AUTOMATED SPEECH RECOGNITION)
First Claim
Patent Images
1. An apparatus, comprising:
- an audio corpus;
transcription data corresponding to the audio corpus;
an automatic speech recognition logic coupled to the interface for receiving the audio corpus and interface for receiving transcription data;
wherein the automatic speech recognition logic is operable to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and transcription data corresponding to the audio corpus;
wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters;
wherein the automatic speech recognition logic determines a selected speaker cluster selected from the plurality of speaker clusters for a source responsive to receiving audio data and a transcription of the audio data from a source; and
wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing audio from the source.
1 Assignment
0 Petitions
Accused Products
Abstract
In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.
10 Citations
20 Claims
-
1. An apparatus, comprising:
-
an audio corpus; transcription data corresponding to the audio corpus; an automatic speech recognition logic coupled to the interface for receiving the audio corpus and interface for receiving transcription data; wherein the automatic speech recognition logic is operable to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and transcription data corresponding to the audio corpus; wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters; wherein the automatic speech recognition logic determines a selected speaker cluster selected from the plurality of speaker clusters for a source responsive to receiving audio data and a transcription of the audio data from a source; and wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing audio from the source. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
determining a plurality of speaker types by an automatic speech recognition system from a corpus of audio data and corresponding transcription data; training the automatic speech recognition system for each speaker type; receiving audio data from a new user; determining a selected one of the plurality of speaker types based on audio data and a transcription of the audio received from the new user by the automatic speech recognition system; and transcribing audio data by the automatic speech recognition system from the new user based on the selected one of the plurality of speaker types. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. Logic encoded on at least one tangible media for execution by a processor, and when executed operable to:
-
determine a plurality of speaker types from a corpus of audio data and corresponding transcription data; train for automatic speech recognition of each speaker type; and receiving audio data from a source; determine a selected one of the plurality of speaker types based on audio data and a transcription of the audio received from the source; and transcribe audio data from the source based on the selected one of the plurality of speaker types. - View Dependent Claims (19, 20)
-
Specification