EFFICIENT CONVERSION OF VOICE MESSAGES INTO TEXT

US 20090052636A1
Filed: 10/29/2008
Published: 02/26/2009
Est. Priority Date: 03/28/2002
Status: Active Grant

First Claim

Patent Images

1. A method for transcribing verbal messages into text, comprising the steps of:

(a) receiving verbal messages over a network and queuing the verbal messages in a queue for processing into text;

(b) automatically processing at least portions of successive verbal messages from the queue with online processors using an automated speech recognition (ASR) program to produce corresponding text;

(c) assigning whole verbal messages or segments of the verbal messages that have been automatically processed to selected workbench stations for further editing and transcription by operators at the workbench stations;

(d) enabling the operators at the workbench stations to which the whole or the segments of the verbal messages have been assigned to listen to the verbal messages, correct errors in the text that was produced by the automatic processing, and transcribe portions of the verbal messages that have not been automatically processed by the ASR program, producing final text messages or segments of final text messages corresponding to the verbal messages that were in the queue; and

(e) assembling segments of the text messages produced by the operators at the workbench stations from the segments of the verbal messages that were processed and using whole text messages corresponding to the whole verbal messages that were processed, producing final output text messages.

View all claims

21 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for efficiently transcribing verbal messages transmitted over the Internet (or other network) into text. The verbal messages are initially checked to ensure that they are in a valid format and include a return network address, and if so, are processed either as whole verbal messages or split into segments. These whole verbal messages and segments are processed by an automated speech recognition (ASR) program, which produces automatically recognized text. The automatically recognized text messages or segments are assigned to selected workbenches for manual editing and transcription, producing edited text. The segments of edited text are reassembled to produce whole edited text messages, undergo post processing to correct minor errors and output as an email, an SMS message, a file, or an input to a program. The automatically recognized text and manual edits thereof are returned as feedback to the ASR program to improve its accuracy.

Citations

33 Claims

1. A method for transcribing verbal messages into text, comprising the steps of:
- (a) receiving verbal messages over a network and queuing the verbal messages in a queue for processing into text;
  
  (b) automatically processing at least portions of successive verbal messages from the queue with online processors using an automated speech recognition (ASR) program to produce corresponding text;
  
  (c) assigning whole verbal messages or segments of the verbal messages that have been automatically processed to selected workbench stations for further editing and transcription by operators at the workbench stations;
  
  (d) enabling the operators at the workbench stations to which the whole or the segments of the verbal messages have been assigned to listen to the verbal messages, correct errors in the text that was produced by the automatic processing, and transcribe portions of the verbal messages that have not been automatically processed by the ASR program, producing final text messages or segments of final text messages corresponding to the verbal messages that were in the queue; and
  
  (e) assembling segments of the text messages produced by the operators at the workbench stations from the segments of the verbal messages that were processed and using whole text messages corresponding to the whole verbal messages that were processed, producing final output text messages.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 31)
- - 2. The method of claim 1, further comprising the step of validating a format of the verbal message and a return address for delivery of an output text message before enabling queuing of each verbal message.
  - 3. The method of claim 1, further comprising the step of assigning verbal messages to specific online processors in accord with predefined assignment rules.
  - 4. The method of claim 1, wherein whole verbal messages are simultaneously sent to the online processors for processing using the ASR program and to a queue for processing by one of the workbench stations.
  - 5. The method of claim 1, further comprising the step of separating an audio content in a verbal message from associated metadata, wherein the associated metadata includes one or more elements selected from the group consisting of:
    - (a) proper nouns;
      
      (b) a caller name, if the verbal message is a voice mail; and
      
      (c) a name of a person being called, if the verbal message is a voice mail.
  - 6. The method of claim 5, wherein the audio content and the metadata verbal messages in the queue are input to the online processors for improving accuracy of the ASR program.
  - 7. The method of claim 1, wherein the step of automatically processing includes the steps of:
    - (a) checking for common content patterns in the verbal messages to aid in automated speech recognition; and
      
      (b) checking automatically recognized speech using a pattern matching technique to identify any common message formats.
  - 8. The method of claim 1, further comprising the step of breaking up at least some of the verbal messages into the segments based on predefined rules, including one or more rules selected from the group consisting of:
    - (a) breaking the verbal message into the segments where silence is detected;
      
      (b) breaking the verbal message into the segments so that the segments have a predefined maximum duration; and
      
      (c) breaking the verbal message into the segments so that the segments have between a predefined minimum and a predefined maximum number of words.
  - 9. The method of claim 8, further comprising the steps of:
    - (a) assigning confidence ratings to the segments of the verbal messages that were automatically recognized by the ASR program;
      
      (b) assigning the verbal message, the automatically recognized text, a timeline for the verbal message, and the confidence ratings of the segments to a workbench partial message queue; and
      
      (c) withholding segments that have a confidence rating above a predefined level from the workbench partial message queue, based on a high probability that the automatically recognized text is correct.
  - 10. The method of claim 1, wherein the step of assigning whole verbal messages or segments of verbal messages comprises the steps of:
    - (a) assigning the whole verbal messages or the segments of verbal messages to a specific workbench station used by an operator eligible to process verbal messages of that type; and
      
      (b) assigning segments of verbal messages having a lower quality to workbench stations first to ensure said segments are transcribed with a highest quality, in a time allotted to process each of the verbal messages.
  - 11. The method of claim 1, wherein the operators at the workbench stations edit and control transcription of the verbal messages in a browsing program display, and wherein transcription of the whole verbal messages is selectively carried out in one of three modes, including:
    - (a) a word mode that includes keyboard inputs for specific transcription inputs;
      
      (b) a line mode that facilitates looping through an audible portion of the verbal message to focus on a single line of transcribed text at a time; and
      
      (c) a whole message mode, in which the operator working at the workbench station listens to the whole verbal message to produce the corresponding text.
  - 12. The method of claim 11, wherein transcription of parts of a verbal message is carried out by an operator at a workbench station, and further comprising the step of displaying a graphical representation of an audio waveform for at least a part of the verbal message to the operator, with a segment to be transcribed visually indicated.
  - 13. The method of claim 1, further comprising the step of applying post processing to text corresponding to the verbal messages that were transcribed, for correcting minor errors in the text.
  - 14. The method of claim 13, wherein:
    - (a) if editing the automatically produced text for a whole verbal message by an operator on a workbench station will exceed a required turn-around-time, further comprising the step of immediately post processing the automatically produced text without using any edits provided by an operator at a workbench station; and
      
      (b) if editing parts of the verbal message will exceed the required turn-around-time, further comprising the step of post processing any text of the verbal message that was automatically recognized and has a confidence rating that is greater than a predefined minimum, any segments of the verbal message that have already been edited or transcribed by an operator on a workbench station, and any text of the verbal message that was automatically recognized and was moved into a workbench station queue but has not yet been edited by an operator at a workbench station.
  - 15. The method of claim 1, wherein the step of producing final output text messages comprises the steps of making the final output text messages available to an end user by transmitting the final output text messages to the end user in connection with one of:
    - (a) an email message transmitted over the network;
      
      (b) a short message service message transmitted over the network and through a telephone system;
      
      (c) a file transmitted over the network to a program interface; and
      
      (d) a file transmitted over the network to a web portal.
  - 16. The method of claim 1, further comprising the step of employing edits made to text produced by the ASR program by operators at the workbench stations, as feedback used to improve an accuracy of the ASR program.
  - 17. The method of claim 1, further comprising the steps of:
    - (a) determining a confidence level for portions of the verbal messages recognized by the ASR program, the confidence level being indicative of a likely accuracy of the text output by the ASR program;
      
      (b) giving priority to assigning portions of the verbal messages and text that were automatically recognized having a lower confidence level to operators at the workbench stations for editing over portions of the verbal messages and text that were automatically recognized having a higher confidence level, so that more of the difficult portions of the verbal messages will be edited and transcribed by the operators, compared to easier portions;
      
      (c) assessing a demand for transcribing verbal message to determine a transcribing load on available operators at the workbench stations; and
      
      (d) varying a percentage of the final output text messages that comprises only automatically recognized text, relative to a remaining percentage that is output by the operators as a function of the load, so that a growing backlog of verbal messages to be transcribed is avoided by using a greater percentage of automatically recognized text for the final output text messages, as the load increases.
  - 31. The system of claim 8, wherein the final output text messages are made available to an end user by transmitting the final output text messages to the end user in connection with one of:
    - (a) an email message transmitted over the network;
      
      (b) a short message service message transmitted over the network and through a telephone system;
      
      (c) a file transmitted over the network to a program interface; and
      
      (d) a file transmitted over the network to a web portal.

18. A system for efficiently transcribing verbal messages that are provided to the system over a network, to produce corresponding text, comprising:
- (a) a plurality of processors coupled to the network, for receiving and processing verbal messages to be transcribed to text;
  
  (b) one or more of the plurality of processors processing the verbal messages using an automatic speech recognition (ASR) program to produce automatically recognized text;
  
  (c) one or more of the plurality of processors on corresponding one or more workbench stations each providing a graphical interface on a display to enable operators using the one or more workbench stations to review and edit the automatically recognized text, and to further transcribe the verbal messages to produce edited text; and
  
  (d) one or more of the plurality of processors reassembling text segments comprising the edited text, producing final output text messages that can be conveyed to an end user.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33)
- - 19. The system of claim 18, wherein the one or more of the plurality of processors receive the verbal messages transmitted over the network and assign the verbal messages received to others of the plurality of processors based on predefined assignment rules.
  - 20. The system of claim 19, wherein the one or more of the plurality of processors validate an audio format and check for a return address to a location on the network for each of the verbal messages that have been received, terminate processing of any verbal message that has an invalid audio format or lacks a return address, queue the verbal messages that are found to have a valid audio format in a new verbal message queue, and assign the verbal messages in the new verbal message queue to selected other one or more of the plurality of processors based on at least one of:
    - (a) a content type of the verbal message;
      
      (b) an availability of the other processors; and
      
      (c) a priority level of the verbal message.
  - 21. The system of claim 18, wherein the one or more of the plurality of processors input verbal messages to the ASR program and also add the verbal message to a workbench queue for manual processing by the one or more operators.
  - 22. The system of claim 18, wherein the one or more of the plurality of processors identify patterns in the verbal messages and in the automatically recognized text to determine a confidence rating for segments of the verbal messages.
  - 23. The system of claim 22, wherein if the confidence rating for a segment is above a predefined level, the one or more of the plurality of processors do not submit the segment for further processing by an operator at a workbench station, but instead submit the segment for final assembly into an edited text message.
  - 24. The system of claim 22, wherein the one or more of the plurality of processors break up at least some of the verbal messages into the segments based on predefined rules, including one or more predefined rules selected from the group consisting of:
    - (a) breaking the verbal message into successive segments at points in the verbal message where silence is detected between the successive segments;
      
      (b) breaking the verbal message into the segments so that the segments have a predefined maximum duration; and
      
      (c) breaking the verbal message into the segments so that the segments have between a predefined minimum and a predefined maximum number of words.
  - 25. The system of claim 18, wherein the ASR program is provided input of both audio data and metadata comprising the verbal messages, to improve an accuracy with which the text is automatically recognized when processing the verbal messages with the ASR program.
  - 26. The system of claim 25, wherein the metadata for a verbal message includes at least one or more elements selected from the group consisting of:
    - (a) proper nouns;
      
      (b) a caller name, if the verbal message is a voice mail; and
      
      (c) a name of a person being called, if the verbal message is a voice mail.
  - 27. The system of claim 18, wherein segments of a verbal message having a lower quality are assigned to workbench stations for editing and transcription by the operators before segments having a higher quality, to ensure the segments having lower quality are manually transcribed to achieve greater accuracy, in a time allotted to transcribe each of the verbal messages, and wherein different segments of a verbal message may be assigned to different workbench stations for editing and transcription by a plurality of different operators.
  - 28. The system of claim 18, wherein the workbench station includes a display on which a graphical representation of an audio waveform is displayed for at least a part of the verbal message then being transcribed by an operator of the workbench station, with a segment of the verbal message being transcribed visually indicated.
  - 29. The system of claim 18, wherein the one or more fourth processors apply post processing to the text before producing the output text corresponding to the verbal messages that were transcribed, for correcting minor errors in the text.
  - 30. The system of claim 29, wherein:
    - (a) if editing the automatically produced text for a whole verbal message by an operator on a workbench station will exceed a required turn-around-time, the automatically produced text is submitted for post processing without using any edits provided by an operator at a workbench station; and
      
      (b) if editing parts of a verbal message will exceed the required turn-around-time, then immediately submitting for post processing;
      
      (i) any part of the verbal message that was automatically recognized and has a confidence rating that is greater than a predefined minimum;
      
      (ii) any segments of the verbal message that have already been edited or transcribed by an operator on a workbench station; and
      
      (iii) any automatically recognized text that was moved into a workbench station queue but has not yet been edited by an operator at a workbench station.
  - 32. The system of claim 18, wherein edits made by operators at the workbench stations to the automatically recognized text produced by the ASR program are employed as feedback for use in improving an accuracy of the ASR program.
  - 33. The system of claim 18, wherein:
    - (a) a confidence level is determined for portions of the verbal messages recognized by the ASR program, the confidence level being indicative of a likely accuracy of the text output by the ASR program;
      
      (b) priority is given to assigning portions of the verbal messages and automatically recognized text having a lower confidence level to operators at the workbench stations for editing over portions of the verbal messages and text that were automatically recognized having a higher confidence level, so that more of the difficult portions of the verbal messages will be edited and transcribed by the operators, compared to easier portions;
      
      (c) a current demand for transcribing verbal message is assessed to determine a transcribing load on available operators at the workbench stations; and
      
      (d) varying a percentage of the final output text messages that comprises only automatically recognized text, relative to a remaining percentage that is output by the operators as a function of the load, so that a growing backlog of verbal messages to be transcribed by the system is avoided by using a greater percentage of automatically recognized text for the final output text messages, as the load increases.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avaya LLC (Avaya Incorporated), Avaya Management L.P. (Avaya Incorporated)
Original Assignee
GotVoice Incorporated (Avaya Incorporated)
Inventors
Webb, Mike O., Kaseda, Janet S., Peterson, Bruce J.

Granted Patent

US 8,239,197 B2
Time in Patent Office

Days
Field of Search
US Class Current

379/88.140
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/26   Speech to text systems G10L...

H04M 2201/40   using speech recognition

H04M 2201/60   Medium conversion

H04M 2203/2016   Call initiation by network ...

H04M 2203/4536   Voicemail combined with tex...

H04M 3/53333   Message receiving aspects

EFFICIENT CONVERSION OF VOICE MESSAGES INTO TEXT

First Claim

21 Assignments

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

EFFICIENT CONVERSION OF VOICE MESSAGES INTO TEXT

First Claim

21 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links