Perceptual speech coder and method
First Claim
1. A method for coding an analog speech signal, said method comprising the steps of:
- filtering, sampling, and digitizing said analog speech signal to produce a digital speech signal, said digital speech signal comprising a plurality of frames;
performing frequency analysis on said digital speech signal to produce spectral output data for each of said frames, said spectral output data comprising segments, at least two of said segments being approximately 25 Hz or closer in frequency;
performing auditory analysis on said spectral output data to identify segments of said frames that are inaudible to the human auditory system due to simultaneous or temporal masking effects; and
coding said spectral output data into an output data stream in which said inaudible segments are compressed and audible segments are not compressed.
2 Assignments
0 Petitions
Accused Products
Abstract
Simultaneous and temporal masking of digital speech data is applied to an MBE-based speech coding technique to achieve additional, substantial compression of coded speech over existing coding techniques, while enabling synthesis of coded speech with minimal perceptual degradation relative to the human auditory system. A real-time perceptual coder and decoder is disclosed in which speech may be sampled at 10 kHz, coded at an average rate of less than 2 bits/sample, and reproduced in a manner that is perceptually transparent to a human listener. The coder compresses speech segments that are inaudible due to simultaneous or temporal masking, while audible speech segments are not compressed.
38 Citations
5 Claims
-
1. A method for coding an analog speech signal, said method comprising the steps of:
-
filtering, sampling, and digitizing said analog speech signal to produce a digital speech signal, said digital speech signal comprising a plurality of frames; performing frequency analysis on said digital speech signal to produce spectral output data for each of said frames, said spectral output data comprising segments, at least two of said segments being approximately 25 Hz or closer in frequency; performing auditory analysis on said spectral output data to identify segments of said frames that are inaudible to the human auditory system due to simultaneous or temporal masking effects; and coding said spectral output data into an output data stream in which said inaudible segments are compressed and audible segments are not compressed. - View Dependent Claims (3)
-
-
2. A coder for coding a speech signal comprising a masking segment and a masked segment approximately 25 Hz or closer in frequency to said masking segment, said coder comprising:
-
storage means for storing first application software, second application software, and masking data; a first processor connected to said storage means for using said first application software to generate spectral data for said speech signal; and a second processor connected to said storage means and said first processor for using said second application software, said masking data, and said spectral data to create a coded representation of said speech signal wherein said masked segment is compressed and said masking segment is not compressed. - View Dependent Claims (4, 5)
-
Specification