System and method for the secure, real-time, high accuracy conversion of general quality speech into text
First Claim
1. A system, comprising:
- a receiving element to receive a) audio streams and b) text that has been generated by a speech recognition element acting upon the audio streams, the receiving element to create a audio segments from the audio streams, and b) text segments, corresponding to the audio segments, from the text;
a mixing element to receive the audio segments and to randomize the order of the audio segments and the corresponding text segments;
a transmitting element to send the randomized audio segments and the randomized corresponding text segments to a plurality of transcribers; and
a text receiving element to receive corrected text segments created by the plurality of transcribers using a) the transmitted randomized audio segments and b) the transmitted randomized corresponding text segments.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a speech-to-text conversion system and method that provides secure, real-time and high-accuracy conversion of general-quality speech into text. The system is designed to interface with external devices and services, providing a simple and convenient manner to transcribe audio that may be stored elsewhere such as a wireless phone'"'"'s voice mail, or occurring between two or more parties such as a conference call. The first step in the system'"'"'s process ensures secure and private transcription by separating an audio stream into many audio shreds, each of which has duration of only a few seconds and cannot reveal the context of the conversation. A workforce of geographically distributed transcription agents who transcribe the audio shreds is able to generate transcription in real time, with many agents working in parallel on a single conversation. No one agent (or group of agents) receives a sufficient number of audio shreds to reconstruct the context of any conversation. The use of human transcribers allows the system to overcome limitations typical of computer-based speech recognition and permits accurate transcription of general-quality speech even in acoustically hostile environments.
-
Citations
20 Claims
-
1. A system, comprising:
-
a receiving element to receive a) audio streams and b) text that has been generated by a speech recognition element acting upon the audio streams, the receiving element to create a audio segments from the audio streams, and b) text segments, corresponding to the audio segments, from the text; a mixing element to receive the audio segments and to randomize the order of the audio segments and the corresponding text segments; a transmitting element to send the randomized audio segments and the randomized corresponding text segments to a plurality of transcribers; and a text receiving element to receive corrected text segments created by the plurality of transcribers using a) the transmitted randomized audio segments and b) the transmitted randomized corresponding text segments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method, comprising:
-
creating audio segments from audio streams; creating text segments corresponding to the audio segments; randomizing the order of the audio segments and the corresponding text segments; sending the randomized audio segments and the randomized corresponding text segments to a plurality of transcribers; and receiving corrected text segments created by the plurality of transcribers from the sent randomized audio segments and the sent randomized corresponding text segments, wherein each corrected text segment contains a correction made by one of the plurality transcribers to an inaccuracy in the sent randomized corresponding text segment. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system, comprising:
-
a receiving element to receive a plurality of audio streams and text generated by a speech recognition element, the receiving element creating audio segments from the plurality of audio streams, and text segments corresponding to the audio segments from the text; a mixing element to receive the audio segments and the corresponding text segments from at least two of the plurality of audio streams and to randomize the order of the audio segments and the corresponding text segments from the at least two audio streams to create randomized audio segments and randomized corresponding text segments; transmitting element to send the randomized audio segments and the randomized corresponding text segments to a transcriber; and a text receiving element to receive corrected text segments created by the transcriber from the transmitted randomized audio segments and the transmitted randomized corresponding text segments. - View Dependent Claims (18, 19, 20)
-
Specification