Recognizing speech in the presence of additional audio
First Claim
Patent Images
1. A computer-implemented method comprising:
- receiving, by a mobile device, an audio signal;
determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice;
in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device;
after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and
providing, by the mobile device, the transcription for output.
2 Assignments
0 Petitions
Accused Products
Abstract
The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
183 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable storage device storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising:
-
receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user'"'"'s voice, that the audio signal likely includes both the synthesized voice and the user'"'"'s voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification