USE OF METADATA TO POST PROCESS SPEECH RECOGNITION OUTPUT
First Claim
1. A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of spoken audio input, received by a hand-held mobile communication device, into a textual representation for display on the hand-held mobile communication device, comprising the steps of:
- initializing a hand-held mobile communication device so that the hand-held mobile communication device is capable of communicating with a backend server via a data channel of the hand-held mobile communication device;
upon receipt of an utterance by the hand-held mobile communication device, recording and storing an audio message, representative of the utterance, in the hand-held mobile communication device in the form of binary audio data;
transmitting, via the data channel, the recorded and stored binary audio data, representing the utterance, from the hand-held mobile communication device to a backend server through a client-server communication protocol;
in conjunction with the transmission of the recorded and stored binary audio data, transmitting metadata from the hand-held mobile communication device to the backend server through the client-server communication protocol;
converting the transmitted binary audio data into a textual representation of the utterance in the backend server;
comparing at least one portion of the textual representation to at least one portion of the metadata;
replacing at least one portion of the textual representation with at least one portion of the metadata; and
sending the converted textual representation of the utterance, with metadata replacement, from the server back to the hand-held mobile communication device.
5 Assignments
0 Petitions
Accused Products
Abstract
A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user'"'"'s electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.
227 Citations
44 Claims
-
1. A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of spoken audio input, received by a hand-held mobile communication device, into a textual representation for display on the hand-held mobile communication device, comprising the steps of:
-
initializing a hand-held mobile communication device so that the hand-held mobile communication device is capable of communicating with a backend server via a data channel of the hand-held mobile communication device; upon receipt of an utterance by the hand-held mobile communication device, recording and storing an audio message, representative of the utterance, in the hand-held mobile communication device in the form of binary audio data; transmitting, via the data channel, the recorded and stored binary audio data, representing the utterance, from the hand-held mobile communication device to a backend server through a client-server communication protocol; in conjunction with the transmission of the recorded and stored binary audio data, transmitting metadata from the hand-held mobile communication device to the backend server through the client-server communication protocol; converting the transmitted binary audio data into a textual representation of the utterance in the backend server; comparing at least one portion of the textual representation to at least one portion of the metadata; replacing at least one portion of the textual representation with at least one portion of the metadata; and sending the converted textual representation of the utterance, with metadata replacement, from the server back to the hand-held mobile communication device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of spoken audio input, received by a hand-held mobile communication device, into a textual representation for display on the hand-held mobile communication device, comprising the steps of:
-
initializing a hand-held mobile communication device so that the hand-held mobile communication device is capable of communicating with a backend server via a data channel of the hand-held mobile communication device; upon receipt of an utterance by the hand-held mobile communication device, recording and storing an audio message, representative of the utterance, in the hand-held mobile communication device in the form of binary audio data; transmitting, via the data channel, the recorded and stored binary audio data, representing the utterance, from the hand-held mobile communication device to a backend server through a client-server communication protocol; storing metadata, associated with the hand-held mobile communication device, at the backend server; converting the transmitted binary audio data into a textual representation of the utterance in the backend server; comparing at least one portion of the textual representation at least one portion of the metadata; replacing at least one portion of the textual representation with at least one portion of the metadata; and sending the converted textual representation of the utterance, with metadata replacement, from the server back to the hand-held mobile communication device. - View Dependent Claims (43)
-
-
44. A system for utilizing metadata stored in a computer-readable medium to assist in the conversion of spoken audio input, received by a hand-held mobile communication device, into a textual representation for display on the hand-held mobile communication device, comprising:
-
a hand-held mobile communication device; a backend server; and software in the hand-held mobile communication device and backend server for causing the hand-held mobile communication device and/or the backend server to perform functions comprising; initializing the hand-held mobile communication device so that the hand-held mobile communication device is capable of communicating with the backend server via a data channel of the hand-held mobile communication device; upon receipt of an utterance by the hand-held mobile communication device, recording and storing an audio message, representative of the utterance, in the hand-held mobile communication device in the form of binary audio data; transmitting, via the data channel, the recorded and stored binary audio data, representing the utterance, from the hand-held mobile communication device to a backend server through a client-server communication protocol; in conjunction with the transmission of the recorded and stored binary audio data, transmitting metadata from the hand-held mobile communication device to the backend server through the client-server communication protocol; converting the transmitted binary audio data into a textual representation of the utterance in the backend server; comparing at least one portion of the textual representation at least one portion of the metadata; replacing at least one portion of the textual representation with at least one portion of the metadata; and sending the converted textual representation of the utterance, with metadata replacement, from the server back to the hand-held mobile communication device.
-
Specification