Adjusting speed of human speech playback
First Claim
1. A computer-implemented method, comprising:
- receiving command audio data representing first speech associated with a first user profile;
determining that the command audio data corresponds to a command to play a voice message;
determining a first number of words per minute associated with the first speech;
determining input data representing at least one of playback speed preferences associated with the first user profile, location data associated with the first user profile, or calendar data associated with the first user profile;
receiving input audio data corresponding to the voice message, the input audio data including a representation of second speech;
determining a second number of words per minute associated with the second speech;
determining speech data associated with the input audio data, the speech data representing a signal to noise ratio associated with the input audio data and an indication that numbers are detected in the first speech;
inputting at least one of the first number of words per minute, the input data or the speech data to a trained model, the trained model outputting a third number of words per minute;
determining a speech speed modification factor by dividing the third number of words per minute by the second number of words per minute; and
generating output audio data from the input audio data using the speech speed modification factor, the output audio data representing third speech having the third number of words per minute and corresponding to the second speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A system configured to vary a speech speed of speech represented in input audio data without changing a pitch of the speech. The system may vary the speech speed based on a number of different inputs, including non-audio data, data associated with a command, or data associated with the voice message itself. The non-audio data may correspond to information about an account, device or user, such as user preferences, calendar entries, location information, etc. The system may analyze audio data associated with the command to determine command speech speed, identity of person listening, etc. The system may analyze the input audio data to determine a message speech speed, background noise level, identity of the person speaking, etc. Using all of these inputs, the system may dynamically determine a target speech speed and may generate output audio data having the target speech speed.
23 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
receiving command audio data representing first speech associated with a first user profile; determining that the command audio data corresponds to a command to play a voice message; determining a first number of words per minute associated with the first speech; determining input data representing at least one of playback speed preferences associated with the first user profile, location data associated with the first user profile, or calendar data associated with the first user profile; receiving input audio data corresponding to the voice message, the input audio data including a representation of second speech; determining a second number of words per minute associated with the second speech; determining speech data associated with the input audio data, the speech data representing a signal to noise ratio associated with the input audio data and an indication that numbers are detected in the first speech; inputting at least one of the first number of words per minute, the input data or the speech data to a trained model, the trained model outputting a third number of words per minute; determining a speech speed modification factor by dividing the third number of words per minute by the second number of words per minute; and generating output audio data from the input audio data using the speech speed modification factor, the output audio data representing third speech having the third number of words per minute and corresponding to the second speech. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method, comprising:
-
receiving command audio data to play a voice message, the command audio data associated with a first user profile; determining a command speech speed corresponding to the command audio data; receiving input audio data representing the voice message; determining an original speech speed associated with the voice message; determining speech data associated with the voice message; determining a target speech speed based on at least one of the command speech speed and the speech data; determining a speech speed modification variable based on a difference between the original speech speed and the target speech speed; and generating output audio data from the input audio data using the speech speed modification variable, the output audio data representing a second voice message that corresponds to the voice message and is associated with the target speech speed. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A first device, comprising:
-
at least one processor; a wireless transceiver; and a memory device including first instructions operable to be executed by the at least one processor to configure the first device to; receive command audio data to play a voice message, the command audio data associated with a first user profile; determine a command speech speed corresponding to the command audio data; receive input audio data representing the voice message; determine input data representing configuration data; determine an original speech speed associated with the voice message; determine speech data associated with the voice message; determine a target speech speed based on at least one of the command speech speed and the speech data; determine a speech speed modification variable based on a difference between the original speech speed and the target speech speed; and generate output audio data from the input audio data using the speech speed modification variable, the output audio data representing a second voice message that corresponds to the voice message and is associated with the target speech speed. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification