Speaker's voice recognition system, method and recording medium using two dimensional frequency expansion coefficients
First Claim
1. A voice recognition system comprising a spectrum converter for elongating or contracting a spectrum of a voice signal on a frequency axis, the spectrum converter including:
- an analyzer for converting an input voice signal to an input pattern including cepstrum;
a reference pattern memory with reference patterns stored therein;
an elongation/contracting estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and the reference patterns; and
a converter for converting the input pattern by using the elongation/contraction parameter;
wherein said elongating or contracting of the spectrum of the voice signal is carried out using an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
A voice recognition system comprises an analyzer for converting an input voice signal to an input pattern including cepstrum, a reference pattern for storing reference patterns, an elongation/contraction estimating unit for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and the reference patterns, and a recognizing unit for calculating the distances between the converted input pattern from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition. The elongation/contraction unit estimates an elongation/contraction parameter by using cepstrum included in the input pattern. The elongation/contraction unit does not have various values in advance for determining the elongation/contraction parameter, nor is it necessary for the elongation/contraction unit have to execute distance calculation for various values.
33 Citations
31 Claims
-
1. A voice recognition system comprising a spectrum converter for elongating or contracting a spectrum of a voice signal on a frequency axis, the spectrum converter including:
-
an analyzer for converting an input voice signal to an input pattern including cepstrum;
a reference pattern memory with reference patterns stored therein;
an elongation/contracting estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and the reference patterns; and
a converter for converting the input pattern by using the elongation/contraction parameter;
wherein said elongating or contracting of the spectrum of the voice signal is carried out using an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (3, 4, 27)
-
-
2. A voice recognition system comprising:
-
an analyzer for converting an input voice signal to an input pattern including a cepstrum;
a reference pattern memory for storing reference patterns;
an elongation/contraction estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and reference patterns;
a converter for converting the input pattern by using the elongation/contraction parameter; and
a matching unit for computing the distances between the elongated or contracted input pattern fed out from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition;
wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (25, 26)
-
-
5. A reference pattern learning system comprising:
-
a learning voice memory with learning voice data stored therein;
an analyzer for receiving a learning voice signal from the learning voice memory and converting the learning voice signal to an input pattern including cepstrum;
a reference pattern memory with reference patterns stored therein;
an elongation/contraction estimating unit for outputting an elongation/contraction parameter in a frequency axis by using the input pattern and the reference patterns;
a converter for converting the input pattern by using the elongation/contraction pattern;
a reference pattern estimating unit for updating the reference patterns stored in the reference pattern memory for the learning voice data by using the elongated or contracted input pattern fed out from the converter and the reference patterns; and
a likelihood judging unit for monitoring distance changes by computing distances by using the elongated or contracted input pattern fed out from the converter and the reference patterns;
wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (6, 7, 28)
-
-
8. A voice quality converting system comprising:
-
an analyzer for converting an input voice signal to an input pattern including a cepstrum;
a reference pattern memory for storing reference patterns;
an elongation/contraction estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and reference patterns;
a converter for converting the input pattern by using the elongation/contraction parameter; and
an inverse converter for outputting a signal waveform in time domain by inversely converting the time serial input pattern obtained after the elongation/contraction supplied from the converter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
-
-
9. A recording medium for a computer constituting a spectrum converter by executing elongation or contraction of the spectrum of a voice signal on frequency axis, in which is stored a program for executing the following processes:
-
(a) an analyzing process for converting an input voice signal to an input pattern including cepstrum, (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and reference patterns stored in a reference pattern memory; and
(c) a converting process for converting the input pattern by using the elongation/contraction parameter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
-
-
10. A recording medium for a computer constituting a system for voice recognition by executing elongation or contraction of a spectrum of a voice signal on a frequency axis, in which is stored a program for executing the following processes:
-
(a) an analyzing process for converting an input voice signal to an input pattern including cepstrum, (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter along the frequency axis by using the input pattern and reference patterns stored in a reference pattern memory;
(c) a converting process for converting the input pattern by using the elongation/contraction parameter; and
(d) a matching process for computing the distances between the elongated or contracted input pattern and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (11, 12)
-
-
13. In a computer constituting a system for learning reference patterns from learning voice data, a recording medium, in which is stored a program, for executing the following processes:
-
(a) an analyzing process for receiving learning voice data from learning voice memory with learning voice data stored therein and converting the received learning voice data to an input pattern including cepstrum;
(b) an elongation/contraction estimating process for outputting an elongation/contraction parameter along a frequency axis by using the input pattern and the reference patterns stored in the reference pattern memory;
(c) a converting process for converting the input pattern by using the elongation/contraction parameter;
(d) a reference pattern estimating process for updating the reference patterns for the learning voice data by using the elongated or contracted pattern fed out in the converting process and the reference patterns and;
(e) a likelihood judging process for calculating the distances between the elongated or contracted input pattern after conversion in the converting process and the reference patterns and monitoring changes in distance wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (14, 15)
-
-
16. A recording medium for a computer constituting a spectrum conversion by executing elongation or contraction of the spectrum of a voice signal on a frequency axis, in which is stored a program for executing the following processes:
-
(a) an analyzing process for converting an input voice signal to an input pattern including cepstrum, (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter along the frequency axis by using the input pattern and reference patterns stored in a reference pattern memory;
(c) a converting process for converting the input pattern by using the elongation/contraction parameter; and
(d) an inverse converting process for outputting a signal waveform in time domain by inversely converting the time serial input pattern obtained after the elongation/contraction supplied from the converter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
-
-
17. A spectrum converting method for elongating or contracting a spectrum of a voice signal on a frequency axis, comprising:
-
a first step for converting an input voice signal to an input pattern including cepstrum;
a second step for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and the reference patterns stored in a reference pattern memory; and
a third step for converting the input pattern by using the elongation/contraction parameter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (19, 20, 31)
-
-
18. A voice recognition method comprising:
-
a first step for converting an input voice signal to an input pattern including a cepstrum;
a second step for outputting an elongation/contraction parameter along a frequency axis by using the input pattern and reference patterns stored in a reference pattern memory;
a third step for converting the input pattern by using the elongation/contraction parameter; and
a fourth step for computing the distances between the elongated or contracted input pattern arid the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (29, 30)
-
-
21. A reference pattern learning method comprising:
-
a first step for receiving a learning voice signal from the learning voice memory and converting the learning voice signal to an input pattern including cepstrum;
a second step for outputting an elongation/contraction parameter alone a frequency axis by using the input pattern and the reference patterns stored in a reference pattern memory;
a third step for converting the input pattern by using the elongation/contraction pattern;
a fourth step for updating the reference patterns for the learning voice data by using the elongated or contracted input pattern and the reference patterns; and
a fifth step for monitoring distance changes by computing distances by using the elongated or contracted input pattern and the reference patterns wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance. - View Dependent Claims (22, 23)
-
-
24. A voice recognition method of spectrum conversion to convert a spectrum of a voice signal by executing elongation or contraction of the spectrum on a frequency axis, wherein:
-
the or contraction of the spectrum of the voice signal is defined by a warping function and is executed on cepstrum, and the extent of elongation or contraction of the spectrum on the frequency axis is determined with an elongation/contraction parameter included in the warping function, and an optimum value is determined as elongation/contraction parameter value for each speaker wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
-
Specification