Use of metadata to post process speech recognition output
First Claim
1. A computer implemented method comprising:
- as implemented by one or more computing devices configured with specific executable instructions,receiving, via a data channel, an utterance from a device;
receiving metadata from the device, the metadata including one or more of;
caller identification data, recipient identification data or address book data;
converting the utterance into a textual representation;
determining a plurality of alternative textual representations;
comparing each of the plurality of alternative textual representations to at least one portion of the metadata;
upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata;
replacing at least one portion of the textual representation with the at least one alternative representation determined to have the statistically significant match to create a converted textual representation, andsending the converted textual representation to the device; and
upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, sending the textual representation to the device.
5 Assignments
0 Petitions
Accused Products
Abstract
A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user'"'"'s electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.
148 Citations
49 Claims
-
1. A computer implemented method comprising:
-
as implemented by one or more computing devices configured with specific executable instructions, receiving, via a data channel, an utterance from a device; receiving metadata from the device, the metadata including one or more of;
caller identification data, recipient identification data or address book data;converting the utterance into a textual representation; determining a plurality of alternative textual representations; comparing each of the plurality of alternative textual representations to at least one portion of the metadata; upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata; replacing at least one portion of the textual representation with the at least one alternative representation determined to have the statistically significant match to create a converted textual representation, and sending the converted textual representation to the device; and upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, sending the textual representation to the device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system comprising:
-
a computing device configured to; initialize the computing device to communicate with a server; receive an utterance; transmit the utterance to the server; transmit metadata to the server, the metadata including one or more of; caller identification data, recipient identification data or address book data; and
the server configured to;convert the utterance into a textual representation; determine a plurality of alternative textual representations; compare each of the plurality of alternative textual representations to at least one portion of the metadata; upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata; replace at least one portion of the textual representation with the at least one alternative textual representation determined to have the statistically significant match to create a converted textual representation, and send the converted textual representation to the computing device; and upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, send the textual representation to the device. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A non-transitory computer-readable medium having a computer-executable component, the computer-executable component comprising:
-
a server component operative to; communicate with a computing device, wherein the computing device receives an utterance and transmits the utterance and metadata to the server component; convert the utterance into a textual representation; determine a plurality of alternative textual representations; compare each of the plurality of alternative textual representations to at least one portion of the metadata; upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata; replace at least one portion of the textual representation with the at least one alternative textual representation determined to have the statistically significant match to create a converted textual representation, and send the converted textual representation to the computing device; and upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, send the textual representation to the device. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
-
Specification