Real-time voice masking in a computer network

US 9,947,341 B1
Filed: 01/18/2017
Issued: 04/17/2018
Est. Priority Date: 01/19/2016
Status: Active Grant

First Claim

Patent Images

1. A communication system configured to support real-time voice masking, the system comprising:

a first client computer configured to receive over a computer network a first set of instructions that control the first client computer to;

receive an audio signal representing a portion of speech;

split the audio signal into a plurality of overlapping segments;

generate a frequency domain representation of a current signal segment in the plurality of overlapping segments, wherein the frequency domain representation comprises components corresponding to a plurality of frequency bins;

generate, from the frequency domain representation of the current signal segment, a polar representation comprising a magnitude component and a phase component for each of the frequency bins;

generate a refined frequency domain representation of the current signal segment based on a comparison, for each of the frequency bins, between a first phase component from the current signal segment and a second phase component from a prior signal segment;

calculate an initial cepstrum from the refined frequency domain representation;

calculate a spectral envelope from the initial cepstrum using iterative smoothing with a resolution lower than a resolution of the frequency domain representation, wherein the iterative smoothing terminates after a predetermined number of iterations or a predetermined degree of convergence is reached;

calculate an excitation spectrum from the refined frequency domain representation and the spectral envelope;

rescale the spectral envelope based on a formant adjustment parameter to obtain a modified spectral envelope, wherein the spectral envelope is distinct from the current signal segment, the frequency domain representation, and the initial cepstrum;

calculate a modified frequency domain representation by combining the modified spectral envelope and the excitation spectrum;

synthesize a modified signal segment from the modified frequency domain representation; and

transmit the modified signal segment over the computer network;

a second client computer configured to receive over the computer network a second set of instructions that control the second client computer to play audio signal segments received over the computer network; and

a server configured to receive the modified signal segment from the first client computer and transmit the modified signal segment to the second client computer.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice signal may be adjusted to mask traits such as the gender of a speaker by separating source and filter components of a voice signal using cepstral analysis, adjusting the components based on pitch and formant parameters, and synthesizing a modified signal. Features are disclosed to support real-time voice masking in a computer network by limiting computational complexity and reducing delays in processing and transmission while maintaining signal quality.

8 Citations

View as Search Results

20 Claims

1. A communication system configured to support real-time voice masking, the system comprising:
- a first client computer configured to receive over a computer network a first set of instructions that control the first client computer to;
  
  receive an audio signal representing a portion of speech;
  
  split the audio signal into a plurality of overlapping segments;
  
  generate a frequency domain representation of a current signal segment in the plurality of overlapping segments, wherein the frequency domain representation comprises components corresponding to a plurality of frequency bins;
  
  generate, from the frequency domain representation of the current signal segment, a polar representation comprising a magnitude component and a phase component for each of the frequency bins;
  
  generate a refined frequency domain representation of the current signal segment based on a comparison, for each of the frequency bins, between a first phase component from the current signal segment and a second phase component from a prior signal segment;
  
  calculate an initial cepstrum from the refined frequency domain representation;
  
  calculate a spectral envelope from the initial cepstrum using iterative smoothing with a resolution lower than a resolution of the frequency domain representation, wherein the iterative smoothing terminates after a predetermined number of iterations or a predetermined degree of convergence is reached;
  
  calculate an excitation spectrum from the refined frequency domain representation and the spectral envelope;
  
  rescale the spectral envelope based on a formant adjustment parameter to obtain a modified spectral envelope, wherein the spectral envelope is distinct from the current signal segment, the frequency domain representation, and the initial cepstrum;
  
  calculate a modified frequency domain representation by combining the modified spectral envelope and the excitation spectrum;
  
  synthesize a modified signal segment from the modified frequency domain representation; and
  
  transmit the modified signal segment over the computer network;
  
  a second client computer configured to receive over the computer network a second set of instructions that control the second client computer to play audio signal segments received over the computer network; and
  
  a server configured to receive the modified signal segment from the first client computer and transmit the modified signal segment to the second client computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the first set of instructions further controls the first client computer to make a pitch adjustment by rescaling the excitation spectrum before the excitation spectrum is combined with the modified spectral envelope.
  - 3. The system of claim 1, wherein the first client computer executes the first set of instructions in a web browser.
  - 4. The system of claim 3 wherein the web browser includes a Web Audio API implementation that is invoked by the first set of instructions.
  - 5. The system of claim 1 wherein at least one of the first set of instructions and the second set of instructions comprises multiple portions of instructions transmitted from separate locations.
  - 6. The system of claim 1 wherein the computer network is the Internet, or is composed of multiple constituent networks.
  - 7. The system of claim 1 wherein the first set of instructions is capable of further controlling the first client computer to adjust a relative phase between neighboring frequency bins in the modified frequency domain representation.
  - 8. The system of claim 1, wherein each segment in the plurality of overlapping segments has a duration between 10 milliseconds and 100 milliseconds.
  - 9. The system of claim 1, wherein a percentage of overlap between adjacent segments in the plurality of overlapping segments is greater than 0.5 percent but less than 10 percent of the total duration of each segment in the plurality of overlapping segments.
  - 10. The system of claim 1, wherein the spectral envelope is calculated by low-pass filtering that comprises setting a number of Fourier coefficients in each signal segment to zero, and the number of Fourier coefficients is less than 10 percent of a total quantity of Fourier coefficients in each signal segment but greater than zero.

11. A method for real-time voice masking in a computer network, the method comprising:
- transmitting a first set of instructions over the computer network to a first computer, the first set of instructions capable of controlling the first computer to;
  
  receive an audio signal representing a portion of speech;
  
  split the audio signal into a plurality of segments;
  
  generate a frequency domain representation of a current signal segment in the plurality of segments, wherein the frequency domain representation comprises components corresponding to a plurality of frequency bins;
  
  generate, from the frequency domain representation of the current signal segment, a polar representation comprising a magnitude component and a phase component for at least one frequency bin in the plurality of frequency bins;
  
  generate a refined frequency domain representation of the current signal segment based on a comparison between a first phase component from the current signal segment and a second phase component from a prior signal segment;
  
  calculate an initial cepstrum from the refined frequency domain representation;
  
  calculate a spectral envelope from the initial cepstrum;
  
  calculate an excitation spectrum from the refined frequency domain representation and the spectral envelope;
  
  adjust the spectral envelope based on a formant adjustment parameter to obtain a modified spectral envelope, wherein the spectral envelope is distinct from the current signal segment, the frequency domain representation, and the initial cepstrum;
  
  calculate a modified frequency domain representation based on the modified spectral envelope;
  
  synthesize a modified signal segment from the modified frequency domain representation; and
  
  transmit the modified signal segment over the computer network; and
  
  transmitting a second set of instructions over the computer network for execution at a second computer, the second set of instructions capable of controlling the second computer to play audio signals received over the computer network.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, wherein the first set of instructions is capable of further controlling the first computer to make a pitch adjustment by rescaling at least one of the excitation spectrum, the spectral envelope, and the modified frequency domain representation.
  - 13. The method of claim 11 wherein the first set of instructions is capable of further controlling the first client computer to adjust a relative phase between neighboring frequency bins in the modified frequency domain representation.
  - 14. The method of claim 11, wherein iterative smoothing is used to calculate the spectral envelope based on the initial cepstrum.
  - 15. The method of claim 14, wherein the iterative smoothing is terminated upon reaching a predetermined number of rounds.
  - 16. The method of claim 14, wherein iterative smoothing is terminated upon reaching a predetermined number of rounds or a predetermined degree of convergence, whichever occurs first.
  - 17. The method of claim 14, wherein the spectral envelope is calculated at a resolution that is lower than a resolution of the frequency domain representation.
  - 18. The method of claim 11, wherein each segment in the plurality of segments has a duration between 10 milliseconds and 100 milliseconds.
  - 19. The method of claim 11, wherein a percentage of overlap between adjacent segments in the plurality of segments is greater than 0.5 percent but less than 10 percent of the total duration of each segment in the plurality of segments.
  - 20. The method of claim 11, wherein the spectral envelope is calculated by low-pass filtering that comprises setting a number of Fourier coefficients in each signal segment to zero, and the number of Fourier coefficients is less than 10 percent of a total quantity of Fourier coefficients in each signal segment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interviewing Io, Inc.
Original Assignee
Interviewing Io, Inc.
Inventors
Marsh, Andrew Tatanka, Yi, Steven Young
Primary Examiner(s)
Jackson, Jakieda

Application Number

US15/409,400
Time in Patent Office

454 Days
Field of Search

704205, 704264, 704500
US Class Current
CPC Class Codes

G10L 2021/0135   Voice conversion or morphing

G10L 21/007   characterised by the proces...

G10L 21/038   using band spreading techni...

G10L 25/18   the extracted parameters be...

G10L 25/24   the extracted parameters be...

Real-time voice masking in a computer network

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

8 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Real-time voice masking in a computer network

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

8 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links