PREDICTING PRONUNCIATION IN SPEECH RECOGNITION
First Claim
1. A computer-implemented method for processing a spoken utterance, the method comprising:
- determining at least one language of origin of a song title based at least in part on a spelling of the song title;
determining a plurality of potential pronunciations of the song title based at least in part on the at least one language of origin and a language spoken by a user, wherein each of the plurality of potential pronunciations is associated with a score;
storing an association between each of the plurality of potential pronunciations and the song title;
receiving a spoken utterance comprising a request to play a song;
matching a portion of the spoken utterance with one of the plurality of potential pronunciations based at least in part on a score of the one of the plurality of potential pronunciations;
identifying the song based at least in part on the one of the plurality of potential pronunciations; and
causing the song to be played on a computing device.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic speech recognition (ASR) device may be configured to predict pronunciations of textual identifiers (for example, song names, etc.) based on predicting one or more languages of origin of the textual identifier. The one or more languages of origin may be determined based on the textual identifier. The pronunciations may include a hybrid pronunciation including a pronunciation in one language, a pronunciation in a second language and a hybrid pronunciation that combines multiple languages. The pronunciations may be added to a lexicon and matched to the content item (e.g., song) and/or textual identifier. The ASR device may receive a spoken utterance from a user requesting the ASR device to access the content item. The ASR device determines whether the spoken utterance matches one of the pronunciations of the content item in the lexicon. The ASR device then accesses the content when the spoken utterance matches one of the potential textual identifier pronunciations.
-
Citations
26 Claims
-
1. A computer-implemented method for processing a spoken utterance, the method comprising:
-
determining at least one language of origin of a song title based at least in part on a spelling of the song title; determining a plurality of potential pronunciations of the song title based at least in part on the at least one language of origin and a language spoken by a user, wherein each of the plurality of potential pronunciations is associated with a score; storing an association between each of the plurality of potential pronunciations and the song title; receiving a spoken utterance comprising a request to play a song; matching a portion of the spoken utterance with one of the plurality of potential pronunciations based at least in part on a score of the one of the plurality of potential pronunciations; identifying the song based at least in part on the one of the plurality of potential pronunciations; and causing the song to be played on a computing device. - View Dependent Claims (2, 3, 4)
-
-
5. A computing system, comprising:
-
at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, the instructions configuring the at least one processor; to determine a potential language of origin for a textual identifier, wherein the potential language of origin is based at least in part on textual identifier; to determine a potential pronunciation of the textual identifier, wherein the potential pronunciation is based at least in part on the potential language of origin and a potential spoken language; and to store an association between the potential pronunciation and the textual identifier. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising:
-
program code to determine a potential language of origin for a textual identifier, wherein the potential language of origin is based at least in part on textual identifier; program code to determine a potential pronunciation of the textual identifier, wherein the potential pronunciation is based at least in part on the potential language of origin and a potential spoken language; and program code to store an association between the potential pronunciation and the textual identifier. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification