Method and apparatus for real time speech recognition with and without speaker dependency
First Claim
1. A speaker dependent and independent speech recognition method, based on a comparison between speech characteristic parameter frames and a plurality of reference samples which are generated from a speaker in dependent recognition or from a plurality of persons having representative sounds in independent recognition, comprising the steps of:
- a) converting speech signals into a series of primitive sound spectrum parameter frames;
b) determining beginning and ending points of sounds of speech according to said primitive sound parameter frames, for determination of a sound spectrum parameter frame series;
c) performing non-linear time domain normalization on said sound spectrum parameter frame series into a speech characteristic parameter frame series of predefined length on the time domain, including;
forming a sound stimulus series corresponding to said sound spectrum parameter frame series, selecting or deleting each sound spectrum parameter frame, respectively, according to whether a sound stimulus value of said sound spectrum parameter frame is greater or less than an average sound stimulus value, wherein said sound stimulus value represents the difference between two adjacent sound spectrum parameter frames, and wherein said average sound stimulus value is the average of all of said sound stimulus values;
d) performing amplitude quantizing normalization on said speech characteristic parameter frames obtained from step c);
e) comparing said speech characteristic parameter frame series of step d) with each of a plurality of reference samples, said plurality of reference samples having previously been subjected to amplitude quantizing normalization, for determining a reference sample having a closest match with said speech characteristic parameter frame series; and
f) determining a result of recognition according to said reference sample having said closest match.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for real time speech recognition with and without speaker dependency which includes the following steps. Converting the speech signals into a series of primitive sound spectrum parameter frames; detecting the beginning and ending of speech according to the primitive sound spectrum parameter frame, to determine the sound spectrum parameter frame series; performing non-linear time domain normalization on the sound spectrum parameter frame series using sound stimuli, to obtain speech characteristic parameter frame series with predefined lengths on the time domain; performing amplitude quantization normalization on the speech characteristic parameter frames; comparing the speech characteristic parameter frame series with the reference samples, to determine the reference sample which most closely matches the speech characteristic parameter frame series; and determining the recognition result according to the most closely matched reference sample.
-
Citations
18 Claims
-
1. A speaker dependent and independent speech recognition method, based on a comparison between speech characteristic parameter frames and a plurality of reference samples which are generated from a speaker in dependent recognition or from a plurality of persons having representative sounds in independent recognition, comprising the steps of:
-
a) converting speech signals into a series of primitive sound spectrum parameter frames; b) determining beginning and ending points of sounds of speech according to said primitive sound parameter frames, for determination of a sound spectrum parameter frame series; c) performing non-linear time domain normalization on said sound spectrum parameter frame series into a speech characteristic parameter frame series of predefined length on the time domain, including;
forming a sound stimulus series corresponding to said sound spectrum parameter frame series, selecting or deleting each sound spectrum parameter frame, respectively, according to whether a sound stimulus value of said sound spectrum parameter frame is greater or less than an average sound stimulus value, wherein said sound stimulus value represents the difference between two adjacent sound spectrum parameter frames, and wherein said average sound stimulus value is the average of all of said sound stimulus values;d) performing amplitude quantizing normalization on said speech characteristic parameter frames obtained from step c); e) comparing said speech characteristic parameter frame series of step d) with each of a plurality of reference samples, said plurality of reference samples having previously been subjected to amplitude quantizing normalization, for determining a reference sample having a closest match with said speech characteristic parameter frame series; and f) determining a result of recognition according to said reference sample having said closest match. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A speaker dependent and independent speech recognition apparatus based on a comparison between speech characteristic parameter frames and reference samples which are generated from one of a speaker in dependent recognition and from persons producing representative sounds in independent recognition, comprising:
-
a) a speech parameter extracting means for converting speech signals into a series of primitive sound spectrum parameter frames; b) determining means for determining beginning and ending points of sounds of speech based on said series of primitive sound spectrum parameter frames, for obtaining a sound spectrum parameter frame series; c) time domain normalization means for normalizing said sound spectrum parameter frame series into a speech characteristic parameter frame series of predefined length on the time domain including;
forming a sound stimulus series corresponding to said sound spectrum parameter frame series, selecting or deleting each sound spectrum parameter frame, respectively, according to whether a sound stimulus value of said sound spectrum parameter frame is greater or less than an average sound stimulus value, wherein said sound stimulus value represents the difference between two adjacent sound spectrum parameter frames, and wherein said average sound stimulus value is the average of all of said sound stimulus values;d) quantizing normalization means for performing amplitude quantizing normalization on each frame of said speech characteristic parameter frame series; e) difference evaluation means for comparing said speech characteristic parameter frame series with said plurality of reference samples, for determining a reference sample having a closest match with said speech characteristic parameter frame series of equal length on the time domain; and f) judgement means for determining a result of recognition according to said reference sample having said closest match. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification