Disambiguation of a spoken query term
First Claim
1. A computer-implemented method for speech recognition comprising:
- receiving, by a computing device, audio data that corresponds to a spoken utterance of a user;
generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user;
selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription;
determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and
based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing spoken query terms. In one aspect, a method includes performing speech recognition on an audio signal to select two or more textual, candidate transcriptions that match a spoken query term, and to establish a speech recognition confidence value for each candidate transcription, obtaining a search history for a user who spoke the spoken query term, where the search history references one or more past search queries that have been submitted by the user, generating one or more n-grams from each candidate transcription, where each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription, and determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency.
24 Citations
24 Claims
-
1. A computer-implemented method for speech recognition comprising:
-
receiving, by a computing device, audio data that corresponds to a spoken utterance of a user; generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user; selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription; determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a computing device, audio data that corresponds to a spoken utterance of a user; generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user; selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription; determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance. - View Dependent Claims (21, 22, 23)
-
24. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, by a computing device, audio data that corresponds to a spoken utterance of a user; generating, by the computing device, multiple candidate transcriptions of the spoken utterance, wherein one or more of the multiple candidate transcriptions include at least one term previously spoken by the user; selecting, by the computing device and from among the multiple candidate transcriptions of the spoken utterance, a particular candidate transcription; determining, by the computing device, that the particular candidate transcription includes a term that appears more than a predetermined number of times in transcriptions of utterances previously spoken by the user before speaking the spoken utterance; and based on determining that the particular candidate transcription includes a term that appears more than the predetermined number of times in the transcriptions of the utterances previously spoken by the user before speaking the spoken utterance, providing, for display on the computing device and as a speech recognition output, the particular candidate transcription as a transcription of the spoken utterance.
-
Specification