Background audio identification for speech disambiguation
First Claim
1. A computer-implemented method comprising:
- receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content;
processing the second audio data to generate at least one term associated with the background audio;
adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data;
after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and
transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation.
2 Assignments
0 Petitions
Accused Products
Abstract
Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving an audio stream containing first audio data and second audio data, the first audio data corresponding to an utterance spoken by a user and the second audio data corresponding to background audio associated with playback of an item of media content; processing the second audio data to generate at least one term associated with the background audio; adjusting a probability or relevance score associated with a speech recognition model recognizing the at least one term associated with the background audio in the first audio data; after adjusting the probability or relevance score associated with the speech recognition model recognizing the at least one term associated with the background audio, transcribing the first audio data into a textual representation of the utterance using the speech recognition model; and transmitting the textual representation of the utterance to a computing device associated with the user, the textual representation of the utterance when received by the computing device causing the computing device to at least one of display the textual representation on a display or perform a particular task based on the textual representation. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification