System and method for secure real-time high accuracy speech to text conversion of general quality speech

US 6,816,834 B2
Filed: 10/23/2002
Issued: 11/09/2004
Est. Priority Date: 10/23/2002
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

an audio shredder receiving an audio segment, the audio segment being a portion of an audio stream, the audio shredder creating an audio shred from the audio segment;

an audio mixer receiving the audio shred and randomizing the audio shred with other audio shreds from other audio streams; and

a plurality of transcribers, wherein one of the transcribers receives the audio shred and transcribes the audio shred into text.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, comprising the steps of receiving an audio stream, filtering the audio stream to separate identifiable words in the audio stream from unidentifiable words, creating a word text file for the identifiable words and storing the word text file in a database, the word text file including word indexing information. Creating audio segments from the audio stream, the audio segments including portions of the audio stream having unidentifiable words, creating audio shreds from the audio segments, the audio shreds including audio shred indexing information to identify each of the audio shreds and storing the audio shred indexing information in the database. Mixing the audio shreds with other audio shreds from other audio streams, delivering the audio shreds to a plurality of transcribers, transcribing each of the audio shreds into a corresponding audio shred text file, the audio shred text file including the audio shred indexing information corresponding to the audio shred from which the audio shred text file was created and reassembling the audio shred text files and the word text files into a conversation text file corresponding to the audio stream.

Citations

21 Claims

1. A system, comprising:
- an audio shredder receiving an audio segment, the audio segment being a portion of an audio stream, the audio shredder creating an audio shred from the audio segment;
  
  an audio mixer receiving the audio shred and randomizing the audio shred with other audio shreds from other audio streams; and
  
  a plurality of transcribers, wherein one of the transcribers receives the audio shred and transcribes the audio shred into text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, further comprising:
3. The system of claim 2, wherein the text and the other text includes indexing information, the reassembler using the indexing information to create the text file.
4. The system of claim 1, further comprising:
- a delivery module to deliver the text file corresponding to the audio stream.
5. The system of claim 4, wherein the delivery module is one of a display screen and a storage medium.
6. The system of claim 1, further comprising:
- a filter receiving the audio stream, identifying words within the audio stream and creates a word text file corresponding to each of the identified words, the filter creating the audio segment from a portion of the audio stream having words which are unidentifiable by the filter.
7. The system of claim 6, further comprising:
- a database element which stores the word text file corresponding to each of the identified words, the database element further storing indexing information corresponding to the audio shred.
8. The system of claim 1, wherein the audio stream is one of a voice recording and a real-time conversation.
9. The system of claim 1, wherein the audio shred is a plurality of audio shreds and wherein a portion of a first audio shred overlaps a portion of a second audio shred.
10. The system of claim 9, wherein the first audio shred is transcribed by a first transcriber and the second audio shred is transcribed by a second transcriber and the overlapping portions of the first and second audio shreds are compared for accuracy.
11. The system of claim 1, further comprising:
- a transcriber control element to monitor the availability of each of the transcribers and directing the audio shred to an available transcriber.

12. A method, comprising the steps of:
- receiving an audio stream;
  
  filtering the audio stream to separate identifiable words in the audio stream from unidentifiable words;
  
  creating a word text file for the identifiable words;
  
  storing the word text file in a database, the word text file including word indexing information;
  
  creating audio segments from the audio stream, the audio segments including portions of the audio stream having unidentifiable words;
  
  creating audio shreds from the audio segments, the audio shreds including audio shred indexing information to identify each of the audio shreds;
  
  storing the audio shred indexing information in the database;
  
  mixing the audio shreds with other audio shreds from other audio streams;
  
  delivering the audio shreds to a plurality of transcribers;
  
  transcribing each of the audio shreds into a corresponding audio shred text file, the audio shred text file including the audio shred indexing information corresponding to the audio shred from which the audio shred text file was created; and
  
  reassembling the audio shred text files and the word text files into a conversation text file corresponding to the audio stream.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The method according to claim 12, wherein a first boundary of a first audio segment being a first location in the audio stream corresponding to an end of a first identifiable word and a second boundary of the first audio segment being a second location in the audio stream corresponding to a beginning of a second identifiable word.
  - 14. The method of claim 12, wherein there is a 99% degree of confidence for an identifiable word.
  - 15. The method of claim 12, wherein the audio shreds are 3 to 5 seconds.
  - 16. The method according to claim 12, wherein a boundary of each of the audio shreds are pauses between word in the audio segments.
  - 17. The method according to claim 12, wherein each transcriber receives audio shreds and other audio shreds, the delivery of audio shreds to the transcribers being controlled to eliminate contextual meaning to the transcribers.

18. A system, comprising:
- a service platform for receiving, processing and directing streaming audio; and
  
  a user device connected to the service platform and configured to receive streaming audio from the service platform and transmit streaming audio to the service platform, the user device further configured to signal the service platform to begin a transcription of the streaming audio transmitted and received by the user device, wherein the service platform includes a filter receiving the streaming audio, identifying words within the streaming audio and creating a word text file corresponding to each of the identified words, the filter further creating audio segments from the streaming audio, the audio segments including portions of the audio stream having unidentifiable words, an audio shredder creating a plurality of audio shreds from each of the audio segments, an audio mixer randomizing the audio shreds with other audio shreds from other streaming audio, wherein the service platform delivers the randomized audio shreds to a plurality of transcribers which transcribe the audio shreds into audio shred text files corresponding to the audio shreds, a reassembler creating a conversation text file corresponding to the streaming audio from the audio shred text files and the word text files.
- View Dependent Claims (19, 20)
- - 19. The system according to claim 18, wherein the user device is one of an IP phone and a personal computer.
  - 20. The system according to claim 18, wherein the service platform has a data connection to each of the transcribers for delivering the audio shreds.

21. A system, comprising:
- an audio stream element including information corresponding to an audio stream, the information including a begin time of the audio stream, an end time of the audio stream, a conversation identification of the audio stream and the audio stream file;
  
  a word element including information corresponding to a word identified in the audio stream by a speech recognition filter, the information including an identification of the audio stream from which the word was identified, a begin time of the word, an end time of the word, an audio file of the word and text corresponding to the word;
  
  an audio segment element including information corresponding to an audio segment of the audio stream, the audio segment being a portion of the audio stream without identifiable words, the information including the identification of the audio stream from which the audio segment originates, the begin time of the audio segment, the end time of the audio segment and the audio file of the audio segment;
  
  an audio shred element including information corresponding to an audio shred of the audio segment, the information including an identification of the audio segment from which the audio shred originates, the begin time of the audio shred, the end time of the audio shred and the audio file of the audio shred; and
  
  a text token element including information corresponding to a textual representation of the audio shred, the information including an identification of the audio shred from which the textual representation originates and the textual representation, wherein the information included in each of the audio stream element, the word element, the audio segment element, the audio shred element and the text token element is processed to generate a text transcription of the audio stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Messaging Technologies, Inc. (J2 Global, Inc.)
Original Assignee
Jon Jaroker
Inventors
Jaroker, Jon
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/280,302
Publication Number

US 20040083105A1
Time in Patent Office

748 Days
Field of Search

704/235, 704/243, 704/245, 704/260, 379/88.01
US Class Current

704/235
CPC Class Codes

G10L 15/26   Speech to text systems G10L...

H04M 2201/40   using speech recognition

H04M 2201/60   Medium conversion

System and method for secure real-time high accuracy speech to text conversion of general quality speech

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for secure real-time high accuracy speech to text conversion of general quality speech

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links