System and method for the secure, real-time, high accuracy conversion of general-quality speech into text

US 20050010407A1
Filed: 08/03/2004
Published: 01/13/2005
Est. Priority Date: 10/23/2002
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a receiving element to receive audio segments which are portions of audio streams, the receiving element creating sub-segments from the audio segments;

a mixing element receiving the sub-segments from the audio streams and randomizing the sub-segments;

a transmitting element sending the randomized sub-segments to a plurality of transcribers, each of the randomized sub-segments being transcribed into text by the transcriber which received the randomized sub-segment; and

a text receiving element receiving the transcribed text from each of the transcribers.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a speech-to-text conversion system and method that provides secure, real-time and high-accuracy conversion of general-quality speech into text. The system is designed to interface with external devices and services, providing a simple and convenient manner to transcribe audio that may be stored elsewhere such as a wireless phone'"'"'s voice mail, or occurring between two or more parties such as a conference call. The first step in the system'"'"'s process ensures secure and private transcription by separating an audio stream into many audio shreds, each of which has duration of only a few seconds and cannot reveal the context of the conversation. A workforce of geographically distributed transcription agents who transcribe the audio shreds is able to generate transcription in real time, with many agents working in parallel on a single conversation. No one agent (or group of agents) receives a sufficient number of audio shreds to reconstruct the context of any conversation. The use of human transcribers allows the system to overcome limitations typical of computer-based speech recognition and permits accurate transcription of general-quality speech even in acoustically hostile environments.

139 Citations

View as Search Results

29 Claims

1. A system, comprising:
- a receiving element to receive audio segments which are portions of audio streams, the receiving element creating sub-segments from the audio segments;
  
  a mixing element receiving the sub-segments from the audio streams and randomizing the sub-segments;
  
  a transmitting element sending the randomized sub-segments to a plurality of transcribers, each of the randomized sub-segments being transcribed into text by the transcriber which received the randomized sub-segment; and
  
  a text receiving element receiving the transcribed text from each of the transcribers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The system of claim 1, further comprising:
    - a reassembling element receiving the text corresponding to the sub-segments and combining the text to create a text file corresponding to each of the audio streams.
  - 3. The system of claim 1., further comprising:
    - an accounting element configured to maintain subscriber accounts, wherein the audio streams are received from subscribers to the system, each subscriber account being debited when the system transcribes audio streams corresponding to each subscriber.
  - 4. The system of claim 3, wherein the accounting element further maintains transcriber accounts, each transcriber account being credited when the transcriber corresponding to the account transcribes the audio sub-segments.
  - 5. The system of claim 4, wherein the transcriber accounts are credited based on one of a number of words transcribed and an amount of time of transcription.
  - 6. The system of claim 1, further comprising:
    - a quality assurance element to monitor one of an accuracy and a speed of each of the transcribers.
  - 7. The system of claim 1, further comprising:
    - a workforce management element to monitor an availability of each of the transcribers, wherein the workforce management element distributes randomized sub-segments to each of the transcribers based on their availability.
  - 8. The system of claim 7, wherein the workforce management element further schedules a work time for each of the transcribers.
  - 9. The system of claim 7, wherein the workforce management element further monitors a skill of each of the transcribers, the distribution of the randomized sub-segments being further based on the skill.
  - 10. The system of claim 9, wherein the skill includes one of a language skill, a professional field skill and a dialect skill.
  - 11. The system of claim 9, wherein the workforce management element further invites additional transcribers to the system when the skill is in short supply among the transcribers.
  - 12. The system of claim 11, wherein the invitation is by one of an electronic mail message, a telephone call and a pager message:
  - 13. The system of claim 7, wherein the workforce management element further sets a price for transcription services based on a supply of transcribers and demand of transcription services.
  - 14. The system of claim 1, wherein the audio streams include one of calls directed to a phone that are redirected to the receiving element, voice communications from a personal computer, voice communications from a handheld dictation device, recordings of a voice mail service connected directly to the system, recordings of an external system and recordings on a server.
  - 15. The system of claim 14, wherein the phone includes one of a POTS phone, a PBX phone and an Internet Protocol phone.
  - 16. The system of claim 14, wherein the redirected calls are through a telephone company switch.
  - 17. The system of claim 16, wherein the telephone company switch is under a direction of CALEA.
  - 18. The system of claim 16, wherein the telephone company switch is under direction of one of CFB, CFNA and CF services.
  - 19. The system of claim 14, wherein the audio streams include a telephone call between at least two parties.
  - 20. The system of claim 19, wherein biometrics are used to identify the at least two parties.
  - 21. The system of claim 14, wherein the system accesses the recordings of the external system by directing a phone call to the external system and navigating a menu of the external system using one of DTMF and speech recognition.
  - 22. The system of claim 21, wherein the external system is one of a voice mail system and a general IVR system.

23. A method, comprising the steps of:
- receiving audio segments which are portions of audio streams;
  
  creating sub-segments from the audio segments;
  
  randomizing the sub-segments;
  
  sending the randomized sub-segments to a plurality of transcribers, each of the randomized sub-segments being transcribed into text by the transcriber which received the randomized sub-segment; and
  
  receiving the transcribed text from each of the transcribers.
- View Dependent Claims (24, 25, 26, 27)
- - 24. The method of claim 23, further comprising the steps of:
    - reassembling text corresponding to the sub-segments to create a text file corresponding to each of the audio streams.
  - 25. The method of claim 23, wherein a duration of the sub-segments includes a range of 1-10 seconds.
  - 26. The method of claim 23, wherein a duration of the sub-segments is based on one of a transcription accuracy, a transcription speed and a security level.
  - 27. The method of claim 23, wherein a foreign language of one of the audio segments is determined by sending the one of the audio segments to a plurality of transcribers with different foreign language skills.

28. A method, comprising the steps of:
- receiving an original audio file;
  
  receiving a computer-generated speech-to-text file corresponding to the original audio file; and
  
  comparing the computer-generated speech-to-text file with the original audio file.
- View Dependent Claims (29)
- - 29. The method according to claim 28, further comprising the step of:
    - correcting errors in the computer-generated speech-to-text file identified in the comparing step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Messaging Technologies, Inc. (J2 Global, Inc.)
Original Assignee
J2 Global, Inc.
Inventors
Jaroker, Jon

Granted Patent

US 7,539,086 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G06Q 10/00   Administration; Management

G06Q 10/10   Office automation; Time man...

G06Q 50/18   Legal services

G10L 15/26   Speech to text systems G10L...

H04M 2201/40   using speech recognition sp...

H04M 2201/60   Medium conversion

System and method for the secure, real-time, high accuracy conversion of general-quality speech into text

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

139 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

System and method for the secure, real-time, high accuracy conversion of general-quality speech into text

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

139 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others