Real-time voice masking in a computer network
First Claim
Patent Images
1. A communication system configured to support real-time voice masking, the system comprising:
- a first client computer configured to receive over a computer network a first set of instructions that control the first client computer to;
receive an audio signal representing a portion of speech;
split the audio signal into a plurality of overlapping segments;
generate a frequency domain representation of a current signal segment in the plurality of overlapping segments, wherein the frequency domain representation comprises components corresponding to a plurality of frequency bins;
generate, from the frequency domain representation of the current signal segment, a polar representation comprising a magnitude component and a phase component for each of the frequency bins;
generate a refined frequency domain representation of the current signal segment based on a comparison, for each of the frequency bins, between a first phase component from the current signal segment and a second phase component from a prior signal segment;
calculate an initial cepstrum from the refined frequency domain representation;
calculate a spectral envelope from the initial cepstrum using iterative smoothing with a resolution lower than a resolution of the frequency domain representation, wherein the iterative smoothing terminates after a predetermined number of iterations or a predetermined degree of convergence is reached;
calculate an excitation spectrum from the refined frequency domain representation and the spectral envelope;
rescale the spectral envelope based on a formant adjustment parameter to obtain a modified spectral envelope, wherein the spectral envelope is distinct from the current signal segment, the frequency domain representation, and the initial cepstrum;
calculate a modified frequency domain representation by combining the modified spectral envelope and the excitation spectrum;
synthesize a modified signal segment from the modified frequency domain representation; and
transmit the modified signal segment over the computer network;
a second client computer configured to receive over the computer network a second set of instructions that control the second client computer to play audio signal segments received over the computer network; and
a server configured to receive the modified signal segment from the first client computer and transmit the modified signal segment to the second client computer.
2 Assignments
0 Petitions
Accused Products
Abstract
A voice signal may be adjusted to mask traits such as the gender of a speaker by separating source and filter components of a voice signal using cepstral analysis, adjusting the components based on pitch and formant parameters, and synthesizing a modified signal. Features are disclosed to support real-time voice masking in a computer network by limiting computational complexity and reducing delays in processing and transmission while maintaining signal quality.
8 Citations
20 Claims
-
1. A communication system configured to support real-time voice masking, the system comprising:
-
a first client computer configured to receive over a computer network a first set of instructions that control the first client computer to; receive an audio signal representing a portion of speech; split the audio signal into a plurality of overlapping segments; generate a frequency domain representation of a current signal segment in the plurality of overlapping segments, wherein the frequency domain representation comprises components corresponding to a plurality of frequency bins; generate, from the frequency domain representation of the current signal segment, a polar representation comprising a magnitude component and a phase component for each of the frequency bins; generate a refined frequency domain representation of the current signal segment based on a comparison, for each of the frequency bins, between a first phase component from the current signal segment and a second phase component from a prior signal segment; calculate an initial cepstrum from the refined frequency domain representation; calculate a spectral envelope from the initial cepstrum using iterative smoothing with a resolution lower than a resolution of the frequency domain representation, wherein the iterative smoothing terminates after a predetermined number of iterations or a predetermined degree of convergence is reached; calculate an excitation spectrum from the refined frequency domain representation and the spectral envelope; rescale the spectral envelope based on a formant adjustment parameter to obtain a modified spectral envelope, wherein the spectral envelope is distinct from the current signal segment, the frequency domain representation, and the initial cepstrum; calculate a modified frequency domain representation by combining the modified spectral envelope and the excitation spectrum; synthesize a modified signal segment from the modified frequency domain representation; and transmit the modified signal segment over the computer network; a second client computer configured to receive over the computer network a second set of instructions that control the second client computer to play audio signal segments received over the computer network; and a server configured to receive the modified signal segment from the first client computer and transmit the modified signal segment to the second client computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for real-time voice masking in a computer network, the method comprising:
-
transmitting a first set of instructions over the computer network to a first computer, the first set of instructions capable of controlling the first computer to; receive an audio signal representing a portion of speech; split the audio signal into a plurality of segments; generate a frequency domain representation of a current signal segment in the plurality of segments, wherein the frequency domain representation comprises components corresponding to a plurality of frequency bins; generate, from the frequency domain representation of the current signal segment, a polar representation comprising a magnitude component and a phase component for at least one frequency bin in the plurality of frequency bins; generate a refined frequency domain representation of the current signal segment based on a comparison between a first phase component from the current signal segment and a second phase component from a prior signal segment; calculate an initial cepstrum from the refined frequency domain representation; calculate a spectral envelope from the initial cepstrum; calculate an excitation spectrum from the refined frequency domain representation and the spectral envelope; adjust the spectral envelope based on a formant adjustment parameter to obtain a modified spectral envelope, wherein the spectral envelope is distinct from the current signal segment, the frequency domain representation, and the initial cepstrum; calculate a modified frequency domain representation based on the modified spectral envelope; synthesize a modified signal segment from the modified frequency domain representation; and transmit the modified signal segment over the computer network; and transmitting a second set of instructions over the computer network for execution at a second computer, the second set of instructions capable of controlling the second computer to play audio signals received over the computer network. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification