Speaker-cluster dependent speaker recognition (speaker-type automated speech recognition)
First Claim
Patent Images
1. An apparatus, comprising:
- an audio corpus;
transcription data corresponding to the audio corpus;
automatic speech recognition logic coupled with an associated interface for receiving the audio corpus and the associated interface for receiving transcription data;
wherein the automatic speech recognition logic is operable to analyze the audio corpus and the transcription data corresponding to the audio corpus to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and the transcription data corresponding to the audio corpus;
wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters;
wherein the automatic speech recognition logic is operable to determine a selected speaker cluster selected from the plurality of speaker clusters for an associated source responsive to receiving audio data and a transcription of the audio data from the associated source; and
wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing the audio from the associated source.
1 Assignment
0 Petitions
Accused Products
Abstract
In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.
-
Citations
20 Claims
-
1. An apparatus, comprising:
-
an audio corpus; transcription data corresponding to the audio corpus; automatic speech recognition logic coupled with an associated interface for receiving the audio corpus and the associated interface for receiving transcription data; wherein the automatic speech recognition logic is operable to analyze the audio corpus and the transcription data corresponding to the audio corpus to determine a plurality of speaker clusters corresponding to a plurality of speaker types from the audio corpus and the transcription data corresponding to the audio corpus; wherein the automatic speech recognition logic is trained for each speaker cluster belonging to the plurality of speaker clusters; wherein the automatic speech recognition logic is operable to determine a selected speaker cluster selected from the plurality of speaker clusters for an associated source responsive to receiving audio data and a transcription of the audio data from the associated source; and wherein the automatic speech recognition logic employs the selected speaker cluster for transcribing the audio from the associated source. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method, comprising:
-
analyzing, by an automatic speech recognition system, a corpus of audio data and transcription data corresponding to the corpus of audio data to determine a plurality of speaker types from the corpus of audio data and the corresponding transcription data; training the automatic speech recognition system for each of the plurality of speaker types; receiving audio data from an associated new user; determining a selected one of the plurality of speaker types based on the audio data received from the associated new user and a transcription of the audio received from the associated new user by the automatic speech recognition system; and transcribing, by the automatic speech recognition system, the audio data received from the associated new user based on the selected one of the plurality of speaker types. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. Logic encoded on tangible non-transitory media for execution by a processor, and when executed operable to:
-
analyze a corpus of audio data and transcription data corresponding to the audio data to determine a plurality of speaker types from the corpus of audio data and the corresponding transcription data; train for automatic speech recognition of each of the plurality of speaker types; receive audio data from an associated source; determine a selected one of the plurality of speaker types based on the audio data received from the associated source and a transcription of the audio received from the associated source; and selectively transcribe the audio data received from the associated source based on the selected one of the plurality of speaker types. - View Dependent Claims (19, 20)
-
Specification