Use of metadata to post process speech recognition output

US 8,676,577 B2
Filed: 03/31/2009
Issued: 03/18/2014
Est. Priority Date: 03/31/2008
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method comprising:

as implemented by one or more computing devices configured with specific executable instructions,receiving, via a data channel, an utterance from a device;

receiving metadata from the device, the metadata including one or more of;

caller identification data, recipient identification data or address book data;

converting the utterance into a textual representation;

determining a plurality of alternative textual representations;

comparing each of the plurality of alternative textual representations to at least one portion of the metadata;

upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata;

replacing at least one portion of the textual representation with the at least one alternative representation determined to have the statistically significant match to create a converted textual representation, andsending the converted textual representation to the device; and

upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, sending the textual representation to the device.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user'"'"'s electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.

148 Citations

49 Claims

1. A computer implemented method comprising:
- as implemented by one or more computing devices configured with specific executable instructions,receiving, via a data channel, an utterance from a device;
  
  receiving metadata from the device, the metadata including one or more of;
  
  caller identification data, recipient identification data or address book data;
  
  converting the utterance into a textual representation;
  
  determining a plurality of alternative textual representations;
  
  comparing each of the plurality of alternative textual representations to at least one portion of the metadata;
  
  upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata;
  
  replacing at least one portion of the textual representation with the at least one alternative representation determined to have the statistically significant match to create a converted textual representation, andsending the converted textual representation to the device; and
  
  upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, sending the textual representation to the device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The computer-implemented method of claim 1, further comprising processing the converted textual representation of the utterance with a digit filter that substitutes Arabic numerals for words representing numbers.
  - 3. The computer-implemented method of claim 1, further comprising processing the textual representation of the utterance with a digit filter that substitutes Arabic numerals for words representing numbers.
  - 4. The computer-implemented method of claim 1, further comprising processing the converted textual representation of the utterance with a telephone filter that formats the Arabic numerals of a telephone number into a conventional format.
  - 5. The computer-implemented method of claim 1, further comprising processing the textual representation of the utterance with a telephone filter that formats the Arabic numerals of a telephone number into a conventional format.
  - 6. The computer-implemented method of claim 1, further comprising forwarding the converted textual representation to one or more recipients.
  - 7. The computer-implemented method of claim 1, wherein the device is a mobile phone.
  - 8. The computer-implemented method of claim 1, wherein converting the utterance into a textual representation comprises using grammar.
  - 9. The computer-implemented method of claim 1, further comprising using a text-to-speech engine (TTS) to generate an audio message from the converted textual representation.
  - 10. The computer-implemented method of claim 1, wherein the metadata comprises address book information.
  - 11. The computer-implemented method of claim 1, wherein the metadata comprises calendar information.
  - 12. The computer-implemented method of claim 1, wherein the metadata comprises data stored in different locations.
  - 13. The computer-implemented method of claim 1, wherein the metadata comprises information about an incoming phone call.
  - 14. The computer-implemented method of claim 1, wherein the metadata comprises ID data.
  - 15. The computer-implemented method of claim 1, wherein the metadata comprises recipient ID data.
  - 16. The method of claim 1, further comprising sending advertising to the device according to keywords contained in the converted textual representation, wherein the keywords are associated with the advertising.
  - 17. The computer-implemented method of claim 1, further comprising receiving a geospatial position of the device.
  - 18. The computer-implemented method of claim 17, further comprising sending locations, proximate to the position of the device, of a target of interest according to keywords contained in the converted textual representation.
  - 19. The computer-implemented method of claim 1, further comprising receiving login information.
  - 20. The computer-implemented method of claim 1, wherein the metadata is identifying data.
  - 21. The computer-implemented method of claim 1, wherein a meaning of the converted textual representation is different from a meaning of the textual representation.
  - 22. The computer-implemented method of claim 1, wherein the at least one portion of the metadata represents at least one word that is not represented by the at least one portion of the textual representation.

23. A system comprising:
- a computing device configured to;
  
  initialize the computing device to communicate with a server;
  
  receive an utterance;
  
  transmit the utterance to the server;
  
  transmit metadata to the server, the metadata including one or more of;
  
  caller identification data, recipient identification data or address book data; and
  
  the server configured to;
  
  convert the utterance into a textual representation;
  
  determine a plurality of alternative textual representations;
  
  compare each of the plurality of alternative textual representations to at least one portion of the metadata;
  
  upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata;
  
  replace at least one portion of the textual representation with the at least one alternative textual representation determined to have the statistically significant match to create a converted textual representation, andsend the converted textual representation to the computing device; and
  
  upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, send the textual representation to the device.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 24. The system of claim 23, wherein the server is further configured to process the converted textual representation of the utterance with a digit filter that substitutes Arabic numerals for words representing numbers.
  - 25. The system of claim 23, wherein the server is further configured to process the textual representation of the utterance with a digit filter that substitutes Arabic numerals for words representing numbers.
  - 26. The system of claim 23, wherein the server is further configured to process the converted textual representation of the utterance with a telephone filter that formats the Arabic numerals of a telephone number into a conventional format.
  - 27. The system of claim 23, wherein the server is further configured to process the textual representation of the utterance with a telephone filter that formats the Arabic numerals of a telephone number into a conventional format.
  - 28. The system of claim 23, wherein the server is further configured to forward the converted textual representation to one or more recipients.
  - 29. The system of claim 23, wherein the computing device is a mobile phone.
  - 30. The system of claim 23, wherein the server is configured to convert the utterance into a textual representation using grammar.
  - 31. The system of claim 23, wherein the server is further configured to use a text-to-speech engine (TTS) to generate an audio message from the converted textual representation.
  - 32. The system of claim 23, wherein the metadata comprises at least one of address book information, calendar information, data stored in different locations, information about an incoming phone call, caller ID data, and recipient ID data.
  - 33. The system of claim 23, wherein the server is further configured to receive a geospatial position of the computing device.
  - 34. The system of claim 33, wherein the server is further configured to send locations, proximate to the geospatial position of the device, of a target of interest according to keywords contained in the textual representation.
  - 35. The system of claim 23, wherein the server includes a filter selected from the group consisting of an ad filter, an SMS filter, an obscenity filter, a number filter, a date filter, and a currency filter.
  - 36. The system of claim 23, wherein a meaning of the converted textual representation is different from a meaning of the textual representation.

37. A non-transitory computer-readable medium having a computer-executable component, the computer-executable component comprising:
- a server component operative to;
  
  communicate with a computing device, wherein the computing device receives an utterance and transmits the utterance and metadata to the server component;
  
  convert the utterance into a textual representation;
  
  determine a plurality of alternative textual representations;
  
  compare each of the plurality of alternative textual representations to at least one portion of the metadata;
  
  upon determining that there is a statistically significant match between at least one alternative textual representation and the at least one portion of the metadata;
  
  replace at least one portion of the textual representation with the at least one alternative textual representation determined to have the statistically significant match to create a converted textual representation, andsend the converted textual representation to the computing device; and
  
  upon determining that there is no statistically significant match between each of the plurality alternative textual representations and the at least one portion of the metadata, send the textual representation to the device.
- View Dependent Claims (38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
- - 38. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to process the converted textual representation of the utterance with a digit filter that substitutes Arabic numerals for words representing numbers.
  - 39. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to process the textual representation of the utterance with a digit filter that substitutes Arabic numerals for words representing numbers.
  - 40. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to process the converted textual representation of the utterance with a telephone filter that formats the Arabic numerals of a telephone number into a conventional format.
  - 41. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to process the textual representation of the utterance with a telephone filter that formats the Arabic numerals of a telephone number into a conventional format.
  - 42. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to forward the converted textual representation to one or more recipients.
  - 43. The non-transitory computer-readable medium of claim 37, wherein the server component is operative to convert the utterance into a textual representation using grammar.
  - 44. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to use a text-to-speech engine (TTS) to generate an audio message from the converted textual representation.
  - 45. The non-transitory computer-readable medium of claim 37, wherein the metadata comprises at least one of address book information, calendar information, data stored in different locations, information about an incoming phone call, caller ID data and recipient ID data.
  - 46. The non-transitory computer-readable medium of claim 37, wherein the server component is further operative to receive a geospatial position of the computing device.
  - 47. The non-transitory computer-readable medium of claim 46, wherein the server component is further operative to send locations, proximate to the geospatial position of the device, of a target of interest according to keywords contained in the textual representation.
  - 48. The non-transitory computer-readable medium of claim 37, wherein the server component includes a filter selected from the group consisting of an ad filter, an SMS filter, an obscenity filter, a number filter, a date filter, and a currency filter.
  - 49. The non-transitory computer-readable medium of claim 37, wherein a meaning of the converted textual representation is different from a meaning of the textual representation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Canyon IP Holdings LLC (Intellectual Ventures LLC)
Inventors
Jablokov, Igor Roditis, White, Marc, Jablokov, Victor Roditis, Strohofer, Clifford J. III
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sirjani, Fariba

Application Number

US12/415,874
Publication Number

US 20090248415A1
Time in Patent Office

1,813 Days
Field of Search

704/235, 704251-254, 704/257, 704/270, 704/275
US Class Current

704/235
CPC Class Codes

G10L 15/30 Distributed recognition, e....

G10L 2015/228 of application context

Use of metadata to post process speech recognition output

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

148 Citations

49 Claims

Specification

Solutions

Use Cases

Quick Links

Use of metadata to post process speech recognition output

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

148 Citations

49 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links