INPUT SPEECH QUALITY MATCHING
First Claim
Patent Images
1. A computer-implemented method for processing a whispered utterance and responding in whispered synthesized speech, the method comprising:
- receiving input audio data comprising an input utterance;
processing the input audio data with at least one trained model to determine that the input utterance was whispered;
performing automatic speech recognition (ASR) on the input audio data to determine input text corresponding to the input utterance;
performing natural language understanding processing on the input text to identify a query;
determining content responding to the query based on the input utterance being whispered; and
causing the content to be output.
2 Assignments
0 Petitions
Accused Products
Abstract
A system matches text-to-speech (TTS) or other output to a quality of an input spoken utterance. The system uses trained models to detect a speech quality and generates an indicator of the speech quality. The speech quality may be determined from audio or non-audio data. The indicator is sent to downstream components of the system such as a command processor or TTS system. The output of the system is then determined using the indicator of speech quality, thus customizing an output of the system to the manner in which the utterance was spoken.
132 Citations
23 Claims
-
1. A computer-implemented method for processing a whispered utterance and responding in whispered synthesized speech, the method comprising:
-
receiving input audio data comprising an input utterance; processing the input audio data with at least one trained model to determine that the input utterance was whispered; performing automatic speech recognition (ASR) on the input audio data to determine input text corresponding to the input utterance; performing natural language understanding processing on the input text to identify a query; determining content responding to the query based on the input utterance being whispered; and causing the content to be output. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method comprising:
-
determining an input speech quality corresponding to input audio data; performing automatic speech recognition on the input audio data to determine input text; determining content based on the input text and the input speech quality; and causing the content to be output. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 22, 23)
-
-
13. A computing system comprising:
-
at least one processor; a memory including instructions operable to be executed by the at least one processor to cause the system to perform a set of actions comprising; determining an input speech quality corresponding to input audio data; performing automatic speech recognition on the input audio data to determine input text; determining content based on the input text and the input speech quality; and causing the content to be output. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification