Adjusting speed of human speech playback

US 10,276,185 B1
Filed: 08/15/2017
Issued: 04/30/2019
Est. Priority Date: 08/15/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

receiving command audio data representing first speech associated with a first user profile;

determining that the command audio data corresponds to a command to play a voice message;

determining a first number of words per minute associated with the first speech;

determining input data representing at least one of playback speed preferences associated with the first user profile, location data associated with the first user profile, or calendar data associated with the first user profile;

receiving input audio data corresponding to the voice message, the input audio data including a representation of second speech;

determining a second number of words per minute associated with the second speech;

determining speech data associated with the input audio data, the speech data representing a signal to noise ratio associated with the input audio data and an indication that numbers are detected in the first speech;

inputting at least one of the first number of words per minute, the input data or the speech data to a trained model, the trained model outputting a third number of words per minute;

determining a speech speed modification factor by dividing the third number of words per minute by the second number of words per minute; and

generating output audio data from the input audio data using the speech speed modification factor, the output audio data representing third speech having the third number of words per minute and corresponding to the second speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system configured to vary a speech speed of speech represented in input audio data without changing a pitch of the speech. The system may vary the speech speed based on a number of different inputs, including non-audio data, data associated with a command, or data associated with the voice message itself. The non-audio data may correspond to information about an account, device or user, such as user preferences, calendar entries, location information, etc. The system may analyze audio data associated with the command to determine command speech speed, identity of person listening, etc. The system may analyze the input audio data to determine a message speech speed, background noise level, identity of the person speaking, etc. Using all of these inputs, the system may dynamically determine a target speech speed and may generate output audio data having the target speech speed.

23 Citations

View as Search Results

20 Claims

1. A computer-implemented method, comprising:
- receiving command audio data representing first speech associated with a first user profile;
  
  determining that the command audio data corresponds to a command to play a voice message;
  
  determining a first number of words per minute associated with the first speech;
  
  determining input data representing at least one of playback speed preferences associated with the first user profile, location data associated with the first user profile, or calendar data associated with the first user profile;
  
  receiving input audio data corresponding to the voice message, the input audio data including a representation of second speech;
  
  determining a second number of words per minute associated with the second speech;
  
  determining speech data associated with the input audio data, the speech data representing a signal to noise ratio associated with the input audio data and an indication that numbers are detected in the first speech;
  
  inputting at least one of the first number of words per minute, the input data or the speech data to a trained model, the trained model outputting a third number of words per minute;
  
  determining a speech speed modification factor by dividing the third number of words per minute by the second number of words per minute; and
  
  generating output audio data from the input audio data using the speech speed modification factor, the output audio data representing third speech having the third number of words per minute and corresponding to the second speech.
- View Dependent Claims (2, 3, 4)
- - 2. The computer-implemented method of claim 1, further comprising:
    - determining the second number of words per minute associated with a first portion of the second speech;
      
      determining the third number of words per minute corresponding to the first portion;
      
      determining the speech speed modification factor associated with the first portion by dividing the third number of words per minute by the second number of words per minute;
      
      determining a fourth number of words per minute associated with a second portion of the second speech;
      
      inputting at least one of the first number of words per minute, the input data or the speech data to the trained model, the trained model outputting a fifth number of words per minute corresponding to the second portion;
      
      determining a second speech speed modification factor associated with the second portion by dividing the fifth number of words per minute by the fourth number of words per minute; and
      
      generating the output audio from the input audio data using the speech speed modification factor for the first portion and the second speech speed modification factor for the second portion.
  - 3. The computer-implemented method of claim 2, further comprising:
    - determining a difference between the speech speed modification factor and the second speech speed modification factor;
      
      dividing the difference by a maximum transition value to determine a number of increments;
      
      determining one or more intermediate speech speed modification factors corresponding to a third portion of the second speech, the third portion of the second speech being between the first portion and the second portion, a number of the one or more intermediate speech speed modification factors corresponding to the number of increments; and
      
      generating the output audio from the input audio data using the speech speed modification factor for the first portion, the one or more intermediate speech speed modification factors for the third portion, and the second speech speed modification factor for the second portion.
  - 4. The computer-implemented method of claim 1, further comprising:
    - determining that the input audio data includes a representation of fourth speech, the fourth speech corresponding to a second user profile that is different than the first user profile;
      
      determining a fourth number of words per minute associated with the fourth speech;
      
      inputting at least one of the first number of words per minute, the input data, the speech data or the second user profile to the trained model, the trained model outputting a fifth number of words per minute corresponding to the fourth speech;
      
      determining a second speech speed modification factor associated with the fourth speech by dividing the fifth number of words per minute by the fourth number of words per minute; and
      
      generating the output audio from the input audio data using the speech speed modification factor for a first portion of the input audio data and the second speech speed modification factor for a second portion of the input audio data, the first portion corresponding to the second speech and the second portion corresponding to the fourth speech.

5. A computer-implemented method, comprising:
- receiving command audio data to play a voice message, the command audio data associated with a first user profile;
  
  determining a command speech speed corresponding to the command audio data;
  
  receiving input audio data representing the voice message;
  
  determining an original speech speed associated with the voice message;
  
  determining speech data associated with the voice message;
  
  determining a target speech speed based on at least one of the command speech speed and the speech data;
  
  determining a speech speed modification variable based on a difference between the original speech speed and the target speech speed; and
  
  generating output audio data from the input audio data using the speech speed modification variable, the output audio data representing a second voice message that corresponds to the voice message and is associated with the target speech speed.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
- - 6. The computer-implemented method of claim 5, further comprising:
    - determining urgency data associated with the first user profile, the urgency data representing at least one of location data associated with the first user profile, calendar data associated with the first user profile, or incoming communication data associated with the first user profile, andwherein the target speech speed is determined based on at least one of the command speech speed, the speech data, and the urgency data.
  - 7. The computer-implemented method of claim 5,further comprising:
    - determining a first identity associated with the first user profile;
      
      determining playback speed preferences associated with the first user profile;
      
      determining input data corresponding to information about at least one of the first user profile, the command audio data or a number of voice messages, andwherein;
      
      the speech data corresponds to information about at least one of the voice message or an audio quality of the input audio data, andthe target speech speed is determined based on at least one of the command speech speed, the speech data, and the input data.
  - 8. The computer-implemented method of claim 5, wherein determining the speech data further comprises:
    - determining a second user profile that corresponds to first speech represented in the voice message; and
      
      determining a desired speech speed associated with the second user profile.
  - 9. The computer-implemented method of claim 5, further comprising:
    - determining the original speech speed associated with a first portion of the voice message;
      
      determining the target speech speed corresponding to the first portion;
      
      determining the speech speed modification variable associated with the first portion;
      
      determining a second original speech speed associated with a second portion of the voice message;
      
      determining a second target speech speed corresponding to the second portion;
      
      determining a second speech speed modification variable associated with the second portion; and
      
      generating the output audio from the input audio data using the speech speed modification variable for the first portion and the second speech speed modification variable for the second portion.
  - 10. The computer-implemented method of claim 9, further comprising:
    - determining a difference between the speech speed modification variable and the second speech speed modification variable;
      
      dividing the difference by a maximum transition value to determine a number of increments;
      
      determining one or more intermediate speech speed modification variables corresponding to a third portion of the voice message, the third portion being between the first portion and the second portion, a number of the one or more intermediate speech speed modification variables corresponding to the number of increments; and
      
      generating the output audio from the input audio data using the speech speed modification variable for the first portion, the one or more intermediate speech speed modification variables for the third portion, and the second speech speed modification variable for the second portion.
  - 11. The computer-implemented method of claim 5, further comprising:
    - determining that the input audio data includes a representation of first speech associated with a first user profile and a representation of second speech associated with a second user profile;
      
      determining the original speech speed associated with the first speech;
      
      determining the target speech speed associated with the first speech;
      
      determining the speech modification variable associated with the first speech;
      
      determining a second original speech speed associated with the second speech;
      
      determining a second target speech speed corresponding to the second speech;
      
      determining a second speech speed modification variable associated with the second speech; and
      
      generating the output audio from the input audio data using the speech speed modification variable for a first portion of the input audio data and the second speech speed modification variable for a second portion of the input audio data, the first portion corresponding to the first speech and the second portion corresponding to the second speech.
  - 12. The computer-implemented method of claim 5, further comprising:
    - determining a first volume level associated with the input audio data;
      
      determining a second volume level associated with the target speech speed;
      
      determining a volume modification variable by dividing the second volume level by the first volume level; and
      
      generating the output audio data from the input audio data using the speech speed modification variable and the volume modification variable, the output audio data having the second volume level.
  - 13. The computer-implemented method of claim 5, further comprising:
    - determining a plurality of positions in the input audio data in which to insert a duration of silence, the plurality of positions including a first position; and
      
      generating the output audio data from the input audio data using the speech speed modification variable, the output audio data including the duration of silence at the first position.

14. A first device, comprising:
- at least one processor;
  
  a wireless transceiver; and
  
  a memory device including first instructions operable to be executed by the at least one processor to configure the first device to;
  
  receive command audio data to play a voice message, the command audio data associated with a first user profile;
  
  determine a command speech speed corresponding to the command audio data;
  
  receive input audio data representing the voice message;
  
  determine input data representing configuration data;
  
  determine an original speech speed associated with the voice message;
  
  determine speech data associated with the voice message;
  
  determine a target speech speed based on at least one of the command speech speed and the speech data;
  
  determine a speech speed modification variable based on a difference between the original speech speed and the target speech speed; and
  
  generate output audio data from the input audio data using the speech speed modification variable, the output audio data representing a second voice message that corresponds to the voice message and is associated with the target speech speed.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The first device of claim 14, wherein the first instructions further configure the first device to:
    - determine urgency data associated with the first user profile, the urgency data representing at least one of location data associated with the first user profile, calendar data associated with the first user profile, or incoming communication data associated with the first user profile; and
      
      determine the target speech speed based on at least one of the command speech speed, the speech data, and the urgency data.
  - 16. The first device of claim 14, wherein the first instructions further configure the first device to:
    - determine a first identity associated with the first user profile;
      
      determine playback speed preferences associated with the first user profile;
      
      determine input data corresponding to information about at least one of the first user profile, the command audio data or a number of voice messages; and
      
      wherein;
      
      the speech data corresponds to information about at least one of the voice message or an audio quality of the input audio data, andthe target speech speed is determined based on at least one of the command speech speed, the speech data, and the input data.
  - 17. The first device of claim 14, wherein the first instructions further configure the first device to:
    - determine the original speech speed associated with a first portion of the voice message;
      
      determine the target speech speed corresponding to the first portion;
      
      determine the speech speed modification variable associated with the first portion;
      
      determine a second original speech speed associated with a second portion of the voice message;
      
      determine a second target speech speed corresponding to the second portion;
      
      determine a second speech speed modification variable associated with the second portion; and
      
      generate the output audio from the input audio data using the speech speed modification variable for the first portion and the second speech speed modification variable for the second portion.
  - 18. The first device of claim 17, wherein the first instructions further configure the first device to:
    - determine a difference between the speech speed modification variable and the second speech speed modification variable;
      
      divide the difference by a maximum transition value to determine a number of increments;
      
      determine one or more intermediate speech speed modification variables corresponding to a third portion of the voice message, the third portion being between the first portion and the second portion, a number of the one or more intermediate speech speed modification variables corresponding to the number of increments; and
      
      generate the output audio from the input audio data using the speech speed modification variable for the first portion, the one or more intermediate speech speed modification variables for the third portion, and the second speech speed modification variable for the second portion.
  - 19. The first device of claim 14, wherein the first instructions further configure the first device to:
    - determine that the input audio data includes a representation of first speech associated with a first user profile and a representation of second speech associated with a second user profile;
      
      determine the original speech speed associated with the first speech;
      
      determine the target speech speed associated with the first speech;
      
      determine the speech modification variable associated with the first speech;
      
      determine a second original speech speed associated with the second speech;
      
      determine a second target speech speed corresponding to the second speech;
      
      determine a second speech speed modification variable associated with the second speech; and
      
      generate the output audio from the input audio data using the speech speed modification variable for a first portion of the input audio data and the second speech speed modification variable for a second portion of the input audio data, the first portion corresponding to the first speech and the second portion corresponding to the second speech.
  - 20. The first device of claim 14, wherein the first instructions further configure the first device to:
    - determine a first volume level associated with the input audio data;
      
      determine a second volume level associated with the target speech speed;
      
      determine a volume modification variable by dividing the second volume level by the first volume level; and
      
      generate the output audio data from the input audio data using the speech speed modification variable and the volume modification variable, the output audio data having the second volume level.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Ma, Zhaoqing, Hardie, Tony Roy, Devaraj, Christo Frank
Primary Examiner(s)
Pullias, Jesse S

Application Number

US15/677,659
Time in Patent Office

623 Days
Field of Search

704200-232, 704500-504
US Class Current
CPC Class Codes

G10L 21/04   Time compression or expansion

G10L 25/27   characterised by the analys...

G10L 25/78   Detection of presence or ab...

Adjusting speed of human speech playback

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

23 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Adjusting speed of human speech playback

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links