Human-curated glossary for rapid hybrid-based transcription of audio
First Claim
1. A system configured to curate a glossary and utilize the glossary for rapid transcription of audio, comprising:
- a frontend server configured to transmit, to a backend server, an audio recording comprising speech of multiple people in a room over a period spanning at least two hours; and
the backend server is configured to perform the following;
during the first hour of the period;
segment at least a portion of the audio recording, which was recorded during the first twenty minutes of the period, to segments;
generate, utilizing an automatic speech recognition (ASR) system, a first transcription of a first segment from among the segments;
receive, from a first transcriber, a first phrase that does not appear in the first transcription, but was spoken in the first segment; and
add the first phrase to a glossary;
after the first hour of the period;
generate, utilizing the ASR system, a second transcription of a second segment of the audio recording;
provide the second transcription and the glossary to a second transcriber; and
receive a corrected transcription, in which the second transcriber substituted a second phrase in the second transcription, which was not in the glossary, with the first phrase.
3 Assignments
0 Petitions
Accused Products
Abstract
Described herein are curation of a glossary and its utilization for automatic speech recognition (ASR). In one embodiment, a server receives an audio recording of speech, taken over a period spanning at least two hours. During the first hour, the server generates, utilizing an ASR system, a transcription of a segment of the audio, recorded during the first twenty minutes. The server receives, from a transcriber, a phrase that does not appear in the transcription, but was spoken in the segment, and adds the phrase to a glossary. After the first hour of the period, the server generates, utilizing the ASR system, a second transcription of a second segment of the audio, provides the second transcription and the glossary to a second transcriber, and receives a corrected transcription, in which the second transcriber substituted a second phrase in the second transcription, which was not in the glossary, with the phrase.
64 Citations
20 Claims
-
1. A system configured to curate a glossary and utilize the glossary for rapid transcription of audio, comprising:
-
a frontend server configured to transmit, to a backend server, an audio recording comprising speech of multiple people in a room over a period spanning at least two hours; and the backend server is configured to perform the following; during the first hour of the period;
segment at least a portion of the audio recording, which was recorded during the first twenty minutes of the period, to segments;
generate, utilizing an automatic speech recognition (ASR) system, a first transcription of a first segment from among the segments;
receive, from a first transcriber, a first phrase that does not appear in the first transcription, but was spoken in the first segment; and
add the first phrase to a glossary;after the first hour of the period;
generate, utilizing the ASR system, a second transcription of a second segment of the audio recording;
provide the second transcription and the glossary to a second transcriber; and
receive a corrected transcription, in which the second transcriber substituted a second phrase in the second transcription, which was not in the glossary, with the first phrase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for curating and utilizing a glossary for rapid transcription of audio, comprising:
-
receiving an audio recording comprising speech of multiple people in a room over a period spanning at least two hours; segmenting at least a portion of the audio recording, which was recorded during the first twenty minutes of the period, to segments; generating, utilizing an automatic speech recognition (ASR) system, a first transcription of a first segment from among the segments; receiving, from a first transcriber, a first phrase that does not appear in the first transcription, but was spoken in the first segment; adding the first phrase to a glossary; generating, utilizing the ASR system, a second transcription of a second segment of the audio recording; providing the second transcription and the glossary to a second transcriber; and receiving a corrected transcription, in which the second transcriber substituted a second phrase in the second transcription, which was not in the glossary, with the first phrase. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution by a system including a processor and memory, causes the system to perform operations comprising:
-
receiving an audio recording comprising speech of multiple people in a room over a period spanning at least two hours; segmenting at least a portion of the audio recording, which was recorded during the first twenty minutes of the period, to segments; generating, utilizing an automatic speech recognition (ASR) system, a first transcription of a first segment from among the segments; receiving, from a first transcriber, a first phrase that does not appear in the first transcription, but was spoken in the first segment; adding the first phrase to a glossary; generating, utilizing the ASR system, a second transcription of a second segment of the audio recording; providing the second transcription and the glossary to a second transcriber; and receiving a corrected transcription, in which the second transcriber substituted a second phrase in the second transcription, which was not in the glossary, with the first phrase.
-
Specification