SYSTEMS AND METHODS FOR A TWO PASS DIARIZATION, AUTOMATIC SPEECH RECOGNITION, AND TRANSCRIPT GENERATION
First Claim
1. A method for transcript generation including ASR and diarization, the method comprising:
- receiving an audio file at a platform module;
dividing the audio file into a plurality of chunks;
sending each instance of the plurality of chunks to a speech service module;
converting speech to text for each instance of the plurality of chunks;
returning the text for each instance of the plurality of chunks to the platform module;
merging the text for each instance of the plurality of chunks at the platform module to yield an audio file transcript;
sending the audio file and the plurality of chunks to a diarization module;
performing first pass diarization on the plurality of chunks to yield a plurality of diarized chunks;
performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file;
merging the audio file transcript and the diarized audio file to yield a final transcript.
3 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a method for transcript generation includes receiving an audio file and dividing it into a plurality of chunks. The method further includes sending each instance of the plurality of chunks to a speech service module. The method further includes converting speech to text for each instance of the plurality of chunks and returning the text for each instance of the plurality of chunks. The method further includes merging the text for each instance of the plurality of chunks to yield an audio file transcript and sending the audio file and chunks to a diarization module. The method further includes performing first pass diarization on the chunks to yield a plurality of diarized chunks and performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file. The method further includes merging the files to yield a final transcript.
4 Citations
21 Claims
-
1. A method for transcript generation including ASR and diarization, the method comprising:
-
receiving an audio file at a platform module; dividing the audio file into a plurality of chunks; sending each instance of the plurality of chunks to a speech service module; converting speech to text for each instance of the plurality of chunks; returning the text for each instance of the plurality of chunks to the platform module; merging the text for each instance of the plurality of chunks at the platform module to yield an audio file transcript; sending the audio file and the plurality of chunks to a diarization module; performing first pass diarization on the plurality of chunks to yield a plurality of diarized chunks; performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file; merging the audio file transcript and the diarized audio file to yield a final transcript. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 14, 15, 16, 17, 18, 19)
-
-
11. A system for transcript generation including ASR and diarization, the system comprising:
-
a platform module; a speech service module in communication with the platform module; a diarization module in communication with the platform module, wherein the platform module, the speech service module, and speech service module configured to receive an audio file at the platform module; divide the audio file into a plurality of chunks; send each instance of the plurality of chunks to the speech service module; convert speech to text for each instance of the plurality of chunks; return the text for each instance of the plurality of chunks to the platform module; merge the text for each instance of the plurality of chunks at the platform module to yield an audio file transcript; send the audio file and the plurality of chunks to the diarization module; perform first pass diarization on the plurality of chunks to yield a plurality of diarized chunks; perform second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file; merge the audio file transcript and the diarized audio file to yield a final transcript. - View Dependent Claims (12, 13)
-
-
20. A method of performing diarization on a sound recording, the method comprising:
-
receiving a sound recording; breaking the sound recording into a plurality of chunks; performing a first diarization on the plurality of chunks, wherein the performing includes breaking each of the plurality of chunks into a plurality of segments, for each of the plurality of segments generating statistical speaker information descriptive of the sound characteristics in that segment, and clustering, within each chunk of the plurality of chunks, segments having similar statistical speaker information to generate within each chunk of the plurality of chunks groups of segments grouped according to the similar statistical speaker information; performing a second diarization by clustering between the plurality of chunks, the groups of segments according to grouped similar statistical speaker information, the grouped similar statistical speaker information being characteristics of speech of each group for the groups of segments.
-
-
21. A fixed tangible medium, which when executed by a computing system, executes steps comprising:
-
receiving an audio file at a platform module; dividing the audio file into a plurality of chunks; sending each instance of the plurality of chunks to a speech service module; converting speech to text for each instance of the plurality of chunks; returning the text for each instance of the plurality of chunks to the platform module; merging the text for each instance of the plurality of chunks at the platform module to yield an audio file transcript; sending the audio file and the plurality of chunks to a diarization module; performing first pass diarization on the plurality of chunks to yield a plurality of diarized chunks; performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file; merging the audio file transcript and the diarized audio file to yield a final transcript.
-
Specification