System and method for the secure, real-time, high accuracy conversion of general quality speech into text

US 8,738,374 B2
Filed: 05/22/2009
Issued: 05/27/2014
Est. Priority Date: 10/23/2002
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a receiving element to receive a) audio streams and b) text that has been generated by a speech recognition element acting upon the audio streams, the receiving element to create a audio segments from the audio streams, and b) text segments, corresponding to the audio segments, from the text;

a mixing element to receive the audio segments and to randomize the order of the audio segments and the corresponding text segments;

a transmitting element to send the randomized audio segments and the randomized corresponding text segments to a plurality of transcribers; and

a text receiving element to receive corrected text segments created by the plurality of transcribers using a) the transmitted randomized audio segments and b) the transmitted randomized corresponding text segments.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a speech-to-text conversion system and method that provides secure, real-time and high-accuracy conversion of general-quality speech into text. The system is designed to interface with external devices and services, providing a simple and convenient manner to transcribe audio that may be stored elsewhere such as a wireless phone'"'"'s voice mail, or occurring between two or more parties such as a conference call. The first step in the system'"'"'s process ensures secure and private transcription by separating an audio stream into many audio shreds, each of which has duration of only a few seconds and cannot reveal the context of the conversation. A workforce of geographically distributed transcription agents who transcribe the audio shreds is able to generate transcription in real time, with many agents working in parallel on a single conversation. No one agent (or group of agents) receives a sufficient number of audio shreds to reconstruct the context of any conversation. The use of human transcribers allows the system to overcome limitations typical of computer-based speech recognition and permits accurate transcription of general-quality speech even in acoustically hostile environments.

Citations

20 Claims

1. A system, comprising:
- a receiving element to receive a) audio streams and b) text that has been generated by a speech recognition element acting upon the audio streams, the receiving element to create a audio segments from the audio streams, and b) text segments, corresponding to the audio segments, from the text;
  
  a mixing element to receive the audio segments and to randomize the order of the audio segments and the corresponding text segments;
  
  a transmitting element to send the randomized audio segments and the randomized corresponding text segments to a plurality of transcribers; and
  
  a text receiving element to receive corrected text segments created by the plurality of transcribers using a) the transmitted randomized audio segments and b) the transmitted randomized corresponding text segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, further comprising:
    - a reassembling element to receive, reorder, and combine the text segments generated by the speech recognition element and the text segments corrected by the transcribers, to create a text file corresponding to one of the audio streams.
  - 3. The system of claim 2, wherein the generated text segments and the corrected text segments include indexing information, the reassembling element to use the indexing information to create the text file.
  - 4. The system of claim 1, further comprising:
    - a database element which stores the text segments corresponding to each of the audio segments, the database element further storing indexing information for the text segments.
  - 5. The system of claim 1, wherein the transmitting element has a connection with an audio channel and a data channel to each of the transcribers for sending the randomized audio segments and the randomized corresponding text segments.
  - 6. The system of claim 1, wherein the transmitting element sends the randomized audio segments and the randomized corresponding text segments to the transcribers simultaneously.
  - 7. The system of claim 1, wherein any one of the text segments sent to the transcribers that is accurate is to be indicated as acceptable by the transcribers, and any one of the text segments that is inaccurate is to be corrected by the transcribers.
  - 8. The system of claim 1, further comprising:
    - the speech recognition element to receive the audio streams and to transcribe the audio streams into text.

9. A method, comprising:
- creating audio segments from audio streams;
  
  creating text segments corresponding to the audio segments;
  
  randomizing the order of the audio segments and the corresponding text segments;
  
  sending the randomized audio segments and the randomized corresponding text segments to a plurality of transcribers; and
  
  receiving corrected text segments created by the plurality of transcribers from the sent randomized audio segments and the sent randomized corresponding text segments, wherein each corrected text segment contains a correction made by one of the plurality transcribers to an inaccuracy in the sent randomized corresponding text segment.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, further comprising:
    - storing the created text segments corresponding to the audio segments in a database; and
      
      storing indexing information for the created text segments.
  - 11. The method of claim 9, further comprising:
    - receiving indexing information for the corrected text segment from the transcriber.
  - 12. The method of claim 9, further comprising:
    - reassembling and reordering the text segments and any corrected text segments to create a text file corresponding to each of the audio streams.
  - 13. The method of claim 12, wherein reassembling and reordering the text segments and any corrected text segments to create a text file comprises:
    - using indexing information for the created text segments and any corrected text segments to create the text file.
  - 14. The method of claim 9, wherein sending the randomized audio segments and the randomized corresponding text segments to the plurality of transcribers comprises:
    - sending the randomized audio segments and the randomized corresponding text segments to the plurality of transcribers simultaneously.
  - 15. The method of claim 9, wherein sending the randomized audio segments and the randomized corresponding text segments to the plurality of transcribers comprises:
    - sending the randomized audio segments to a plurality of transcribers via an audio channel of a connection; and
      
      sending the randomized corresponding text segments to the plurality of transcribers via a data channel of the connection.
  - 16. The method of claim 9, further comprising:
    - transcribing the audio streams into text, wherein the text segments are created from the transcribed text.

17. A system, comprising:
- a receiving element to receive a plurality of audio streams and text generated by a speech recognition element, the receiving element creating audio segments from the plurality of audio streams, and text segments corresponding to the audio segments from the text;
  
  a mixing element to receive the audio segments and the corresponding text segments from at least two of the plurality of audio streams and to randomize the order of the audio segments and the corresponding text segments from the at least two audio streams to create randomized audio segments and randomized corresponding text segments;
  
  transmitting element to send the randomized audio segments and the randomized corresponding text segments to a transcriber; and
  
  a text receiving element to receive corrected text segments created by the transcriber from the transmitted randomized audio segments and the transmitted randomized corresponding text segments.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, further comprising:
    - the speech recognition element to receive the plurality of audio streams and to transcribe the plurality of audio streams into text.
  - 19. The system of claim 17, further comprising:
    - a reassembling element to receive and combine the text segments generated by the receiving element and the corrected text segments created by the transcriber, to create at least two text files corresponding to the at least two of the plurality of audio streams.
  - 20. The system of claim 17, wherein the receiving element creates audio segments having a short duration that removes context from each of the audio segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Messaging Technologies, Inc. (J2 Global, Inc.)
Original Assignee
J2 Global, Inc.
Inventors
Jaroker, Jon
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
KOVACEK, DAVID M

Application Number

US12/471,283
Publication Number

US 20090292539A1
Time in Patent Office

1,831 Days
Field of Search

704 1- 8, 704/200, 704/201, 704/203, 704206-210, 704231-236, 704243-245, 704/258, 704/260, 704/270, 704/275, 704/278, 369 1- 5, 369 2401- 3001, 369 471- 4713, 369 68- 69, 369 86- 92, 709201-202, 709204-207, 709223-226, 709238-246, 709/249
US Class Current

704/235
CPC Class Codes

G06Q 10/00   Administration; Management

G06Q 10/10   Office automation; Time man...

G06Q 50/18   Legal services

G10L 15/26   Speech to text systems G10L...

H04M 2201/40   using speech recognition

H04M 2201/60   Medium conversion

System and method for the secure, real-time, high accuracy conversion of general quality speech into text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for the secure, real-time, high accuracy conversion of general quality speech into text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links