Method and apparatus for speech recognition and reproduction
First Claim
Patent Images
1. A method for providing a spectral analysis of an analog signal waveform comprising the steps of:
- dividing the total incoming analog signal into time frames of equal duration;
converting the analog signal to a sequence of discrete signal amplitudes at equally spaced time intervals in each frame;
transforming the sequence of discrete signal amplitudes to a sequence of complex spectral amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as;
##EQU12## wherein k=time sequence indexn=frequency sequence indexr,t=integer summation indexesm=time function parameter defining the number of retained bitsφ
=phase adjustment function and the subscripts (p-r) and (r-t) for n and k refer to bit locations in their binary representation with bit locations ranging from o to the maximum value p and subscript values outside this range representing vanishing values.
4 Assignments
0 Petitions
Accused Products
Abstract
Speech signal analysis for data reduction, as stored for synthesis or recognition, is improved by features including: digital spectral analysis; reduction of channel data and bit allocation by selective summation of groups of contiguous data; using the mean average of the log amplitude to find the deviation for each channel; also using the instaneous shape of the mean value for each channel for pairs of adjacent frames, all combined to find a feature ensemble for each pair of adjacent frames.
99 Citations
28 Claims
-
1. A method for providing a spectral analysis of an analog signal waveform comprising the steps of:
-
dividing the total incoming analog signal into time frames of equal duration; converting the analog signal to a sequence of discrete signal amplitudes at equally spaced time intervals in each frame; transforming the sequence of discrete signal amplitudes to a sequence of complex spectral amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as;
##EQU12## wherein k=time sequence indexn=frequency sequence index r,t=integer summation indexes m=time function parameter defining the number of retained bits φ
=phase adjustment functionand the subscripts (p-r) and (r-t) for n and k refer to bit locations in their binary representation with bit locations ranging from o to the maximum value p and subscript values outside this range representing vanishing values. - View Dependent Claims (2, 3, 4)
-
-
5. A method for producing an analog signal waveform comprising the steps of:
-
providing a predetermined series of digital signals representing a sequence of complex spectral amplitudes; transforming the sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as;
##EQU15## wherein k=time sequence indexn=frequency sequence index r,t=integer summation indexes m=time function parameter defining the number of retained bits φ
=phase adjustment functionconverting the transformed digital data into an analog output signal. - View Dependent Claims (6, 7, 8)
-
-
9. A method for producing audio analog output comprising the steps of:
-
providing a predetermined series of encoded digital signals representing the analog output to be produced; decoding the encoded signals to provide a sequence of complex spectral amplitudes; transforming the sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as;
##EQU18## wherein k=time sequence indexn=frequency sequence index r,t=integer summation indexes m=time function parameter defining the number of retained bits φ
=phase adjustment function;converting the transformed digital data into an analog output signal. - View Dependent Claims (10, 11, 12)
-
-
13. A method for producing a voiceprint template for recognition of an analog waveform signal comprising the steps of:
-
dividing the total signal into time frames of equal duration; converting the analog signal to a sequence of discrete signal amplitudes at equally spaced time intervals in each said frame; transforming the discrete signal amplitudes of each frame to a preselected number of spectral amplitudes representing values of various frequency components of the said series of signal amplitudes; compacting and converting the spectral amplitudes of each frame to a lesser number of channels, each channel being comprised of an energy summation of amplitudes within a designated frequency range expressed in logarithmic amplitudes, and allocated on the basis of predetermined acoustic significance; deriving a mean amplitude value for all of said channels of each frame; measuring a deviation from said mean value for each separate channel amplitude in each frame; determining a feature ensemble for a plurality of successive frames of said total waveform signal; and storing a digital representation of said feature ensembles for said total waveform signal to form a digital coded template thereof. - View Dependent Claims (14, 15)
-
-
16. A word recognition method comprising the steps of:
-
providing a digital data template representing preselected acoustic features of a spoken word which include time-rates-of-change of spectral amplitudes; receiving a spoken word to be compared and performing a spectral analysis thereof to determine data representing its acoustic features including time-rates-of-changes of spectral amplitudes; comparing the template with the received spoken word spectral analysis data to determine a degree of similarity between features given by the metric function;
##EQU19## where;
d=degree of similarityj=channel index a=a scaling factor to account for normal rates of speech b=a parameter for improving recognition performance x=mean amplitude value of spoken word template y=mean amplitude value of stored word template x=time-rate-of-change of spoken word template y=time-rate-of-change of stored word template Δ
xj =deviation of channel amplitude from mean value in spoken word templateΔ
yj =deviation of channel amplitude from mean value in stored word template; andproducing an output in response to a predetermined degree of similarity between said template and said spoken word data. - View Dependent Claims (17, 18, 19)
-
-
20. A voice recognition system for producing a voiceprint template of an analog waveform signal comprising:
-
means for converting an incoming analog signal to a sequence of discrete digital signals; voice processor means including a timing generator for producing repetitive series of timing cycles, counter means for dividing the total incoming signal into time frames of equal length, sequence control means connected to said timing generator including ROM means for providing operating instructions for the processor during said timing cycles, an arithmetic logic unit for performing a spectral analysis of the received digital signals in response to instructions from said ROM means, said ROM means including instructions for;
transforming the discrete signal amplitudes to a preselected number of spectral amplitudes representing values of various frequency components of the said series of signal amplitudes, compacting and converting the spectral amplitudes of each frame to a lesser number of channels, each channel being comprised of a summation of amplitudes within a designated frequency range allocated on the basis of predetermined acoustic significance, deriving a mean amplitude value for all of said channels of each frame, measuring a deviation from said mean value for each separate channel amplitude in each frame, and determining a feature ensemble for each pair of successive frames of said total waveform signal; andexternal memory means for storing a digital representation of said feature ensembles for said total waveform signal comprising a digital coded template thereof. - View Dependent Claims (22, 23, 24)
-
-
21. A voice recognition system for producing a voiceprint template of an analog waveform signal comprising:
-
means for converting an incoming analog signal to a sequence of discrete digital signals; voice processor means including a timing generator for producing repetitive series of timing cycles, counter means for dividing the total incoming analog signal into time frames of equal length, sequence control means connected to said timing generator including ROM means for providing operating instructions for the processor during said timing cycles, means including an arithmetic logic unit for performing a spectral analysis of the received analog signal in response to instructions from said ROM means, said ROM means including instructions for transforming the discrete signal amplitudes of each frame to a sequence of complex spectral amplitudes each representing the magnitude and phase of a function V (n, k) defined as;
##EQU20## wherein;
k=time sequence indexn=frequency sequence index r,t=integer summation indexes m=time function parameter defining the number of retained bits φ
=phase adjustment functionsaid ROM means also including instructions for;
compacting and converting the spectral amplitudes of each frame to a lesser number of channels, each channel being comprised of a summation of signal amplitudes within a designated frequency range allocated on the basis of predetermined acoustic significance;
deriving a mean amplitude value for all of said channels of each frame;
measuring a deviation from said mean value for each separate channel amplitude in each frame, and determining a feature ensemble for each pair of successive frames of said total waveform signal; andexternal memory means for storing a digital representation of said feature ensembles for said total waveform signal comprising a digital coded template thereof. - View Dependent Claims (25, 26)
-
-
27. A voice synthesis device comprising:
-
means providing a predetermined series of digital signals representing a sequence of preselected complex spectral amplitudes; means for transforming said sequence of complex spectral amplitudes to a sequence of discrete time waveform amplitudes, each such spectral amplitude representing the magnitude and phase of a function V(n,k) defined as;
##EQU23## wherein;
k=time sequence indexn=frequency sequence index r,t=integer summation indexes m=time function parameter defining the number of retained bits φ
=phase adjustment functionand means for converting the transformed digital data into an analog output signal. - View Dependent Claims (28)
-
Specification