Speaker's voice recognition system, method and recording medium using two dimensional frequency expansion coefficients

US 6,934,681 B1
Filed: 10/25/2000
Issued: 08/23/2005
Est. Priority Date: 10/26/1999
Status: Active Grant

First Claim

Patent Images

1. A voice recognition system comprising a spectrum converter for elongating or contracting a spectrum of a voice signal on a frequency axis, the spectrum converter including:

an analyzer for converting an input voice signal to an input pattern including cepstrum;

a reference pattern memory with reference patterns stored therein;

an elongation/contracting estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and the reference patterns; and

a converter for converting the input pattern by using the elongation/contraction parameter;

wherein said elongating or contracting of the spectrum of the voice signal is carried out using an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice recognition system comprises an analyzer for converting an input voice signal to an input pattern including cepstrum, a reference pattern for storing reference patterns, an elongation/contraction estimating unit for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and the reference patterns, and a recognizing unit for calculating the distances between the converted input pattern from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition. The elongation/contraction unit estimates an elongation/contraction parameter by using cepstrum included in the input pattern. The elongation/contraction unit does not have various values in advance for determining the elongation/contraction parameter, nor is it necessary for the elongation/contraction unit have to execute distance calculation for various values.

33 Citations

View as Search Results

31 Claims

1. A voice recognition system comprising a spectrum converter for elongating or contracting a spectrum of a voice signal on a frequency axis, the spectrum converter including:
- an analyzer for converting an input voice signal to an input pattern including cepstrum;
  
  a reference pattern memory with reference patterns stored therein;
  
  an elongation/contracting estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and the reference patterns; and
  
  a converter for converting the input pattern by using the elongation/contraction parameter;
  
  wherein said elongating or contracting of the spectrum of the voice signal is carried out using an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (3, 4, 27)
- - 3. The voice recognition system according to claim 1, wherein the converter executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by carrying out the elongation or contraction in cepstrum space.
  - 4. The voice recognition system according to claim 1, wherein the elongation/contraction estimating unit executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in a cepstrum space.
  - 27. The voice recognition system according to claim 3, wherein the elongation/contraction estimating unit executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

2. A voice recognition system comprising:
- an analyzer for converting an input voice signal to an input pattern including a cepstrum;
  
  a reference pattern memory for storing reference patterns;
  
  an elongation/contraction estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and reference patterns;
  
  a converter for converting the input pattern by using the elongation/contraction parameter; and
  
  a matching unit for computing the distances between the elongated or contracted input pattern fed out from the converter and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition;
  
  wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (25, 26)
- - 25. The voice recognition system according to claim 2, wherein the converter executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by carrying out the elongation or contraction in cepstrum space.
  - 26. The voice recognition system according to claim 2, wherein the elongation/contraction estimating unit executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

5. A reference pattern learning system comprising:
- a learning voice memory with learning voice data stored therein;
  
  an analyzer for receiving a learning voice signal from the learning voice memory and converting the learning voice signal to an input pattern including cepstrum;
  
  a reference pattern memory with reference patterns stored therein;
  
  an elongation/contraction estimating unit for outputting an elongation/contraction parameter in a frequency axis by using the input pattern and the reference patterns;
  
  a converter for converting the input pattern by using the elongation/contraction pattern;
  
  a reference pattern estimating unit for updating the reference patterns stored in the reference pattern memory for the learning voice data by using the elongated or contracted input pattern fed out from the converter and the reference patterns; and
  
  a likelihood judging unit for monitoring distance changes by computing distances by using the elongated or contracted input pattern fed out from the converter and the reference patterns;
  
  wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (6, 7, 28)
- - 6. The reference pattern learning system according to claim 5, wherein the converter executes the elongation or contraction of spectrum on the frequency axis with a warping function defining the form of elongation or contraction by carrying out the elongation or contraction in cepstrum space.
  - 7. The reference pattern learning system according to claim 5, wherein the elongation/contraction estimating unit executes the elongation or contraction of spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.
  - 28. The reference pattern learning system according to claim 6, wherein the elongation/contraction estimating unit executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

8. A voice quality converting system comprising:
- an analyzer for converting an input voice signal to an input pattern including a cepstrum;
  
  a reference pattern memory for storing reference patterns;
  
  an elongation/contraction estimating unit for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and reference patterns;
  
  a converter for converting the input pattern by using the elongation/contraction parameter; and
  
  an inverse converter for outputting a signal waveform in time domain by inversely converting the time serial input pattern obtained after the elongation/contraction supplied from the converter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.

9. A recording medium for a computer constituting a spectrum converter by executing elongation or contraction of the spectrum of a voice signal on frequency axis, in which is stored a program for executing the following processes:
- (a) an analyzing process for converting an input voice signal to an input pattern including cepstrum, (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter in frequency axis direction by using the input pattern and reference patterns stored in a reference pattern memory; and
  
  (c) a converting process for converting the input pattern by using the elongation/contraction parameter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.

10. A recording medium for a computer constituting a system for voice recognition by executing elongation or contraction of a spectrum of a voice signal on a frequency axis, in which is stored a program for executing the following processes:
- (a) an analyzing process for converting an input voice signal to an input pattern including cepstrum, (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter along the frequency axis by using the input pattern and reference patterns stored in a reference pattern memory;
  
  (c) a converting process for converting the input pattern by using the elongation/contraction parameter; and
  
  (d) a matching process for computing the distances between the elongated or contracted input pattern and the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (11, 12)
- - 11. The recording medium according to claim 10, wherein the converting process executes the elongation or contraction of spectrum on the frequency axis with a warping function defining the form of elongation or contraction by carrying out the elongation or contraction in cepstrum space.
  - 12. The recording medium according to claim 10, wherein the elongation/contraction estimating process executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

13. In a computer constituting a system for learning reference patterns from learning voice data, a recording medium, in which is stored a program, for executing the following processes:
- (a) an analyzing process for receiving learning voice data from learning voice memory with learning voice data stored therein and converting the received learning voice data to an input pattern including cepstrum;
  
  (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter along a frequency axis by using the input pattern and the reference patterns stored in the reference pattern memory;
  
  (c) a converting process for converting the input pattern by using the elongation/contraction parameter;
  
  (d) a reference pattern estimating process for updating the reference patterns for the learning voice data by using the elongated or contracted pattern fed out in the converting process and the reference patterns and;
  
  (e) a likelihood judging process for calculating the distances between the elongated or contracted input pattern after conversion in the converting process and the reference patterns and monitoring changes in distance wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (14, 15)
- - 14. The recording medium according to claim 13, wherein the converting process executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by carrying out the elongation or contraction in cepstrum space.
  - 15. The recording medium according to claim 13, wherein the elongation/contraction estimating process executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

16. A recording medium for a computer constituting a spectrum conversion by executing elongation or contraction of the spectrum of a voice signal on a frequency axis, in which is stored a program for executing the following processes:
- (a) an analyzing process for converting an input voice signal to an input pattern including cepstrum, (b) an elongation/contraction estimating process for outputting an elongation/contraction parameter along the frequency axis by using the input pattern and reference patterns stored in a reference pattern memory;
  
  (c) a converting process for converting the input pattern by using the elongation/contraction parameter; and
  
  (d) an inverse converting process for outputting a signal waveform in time domain by inversely converting the time serial input pattern obtained after the elongation/contraction supplied from the converter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.

17. A spectrum converting method for elongating or contracting a spectrum of a voice signal on a frequency axis, comprising:
- a first step for converting an input voice signal to an input pattern including cepstrum;
  
  a second step for outputting an elongation/contraction parameter in the frequency axis direction by using the input pattern and the reference patterns stored in a reference pattern memory; and
  
  a third step for converting the input pattern by using the elongation/contraction parameter wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (19, 20, 31)
- - 19. The voice recognition method according to claim 17, wherein the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction is executed by carrying out the elongation or contraction in cepstrum space.
  - 20. The voice recognition method according to claim 17, wherein the elongation/contraction estimating process executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.
  - 31. The voice recognition method according to claim 19, wherein the elongation/contraction estimating process executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

18. A voice recognition method comprising:
- a first step for converting an input voice signal to an input pattern including a cepstrum;
  
  a second step for outputting an elongation/contraction parameter along a frequency axis by using the input pattern and reference patterns stored in a reference pattern memory;
  
  a third step for converting the input pattern by using the elongation/contraction parameter; and
  
  a fourth step for computing the distances between the elongated or contracted input pattern arid the reference patterns and outputting the reference pattern corresponding to the shortest distance as result of recognition wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (29, 30)
- - 29. The voice recognition method according to claim 18, wherein the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction is executed by carrying out the elongation or contraction in cepstrum space.
  - 30. The voice recognition method according to claim 18, wherein the elongation/contraction estimating process executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

21. A reference pattern learning method comprising:
- a first step for receiving a learning voice signal from the learning voice memory and converting the learning voice signal to an input pattern including cepstrum;
  
  a second step for outputting an elongation/contraction parameter alone a frequency axis by using the input pattern and the reference patterns stored in a reference pattern memory;
  
  a third step for converting the input pattern by using the elongation/contraction pattern;
  
  a fourth step for updating the reference patterns for the learning voice data by using the elongated or contracted input pattern and the reference patterns; and
  
  a fifth step for monitoring distance changes by computing distances by using the elongated or contracted input pattern and the reference patterns wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.
- View Dependent Claims (22, 23)
- - 22. The reference pattern learning method according to claim 21, wherein the third step executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by carrying out the elongation or contraction in cepstrum space.
  - 23. The reference pattern learning method according to claim 21, wherein the second step executes the elongation or contraction of the spectrum on the frequency axis with a warping function defining the form of elongation or contraction by using estimation derived from the best likelihood estimation of HMM (hidden Marcov model) in cepstrum space.

24. A voice recognition method of spectrum conversion to convert a spectrum of a voice signal by executing elongation or contraction of the spectrum on a frequency axis, wherein:
- the or contraction of the spectrum of the voice signal is defined by a warping function and is executed on cepstrum, and the extent of elongation or contraction of the spectrum on the frequency axis is determined with an elongation/contraction parameter included in the warping function, and an optimum value is determined as elongation/contraction parameter value for each speaker wherein said elongation/contraction parameter is based on an expansion-compression coefficient obtained by retrieval in two dimensional space such that one value of the coefficient is obtained for each utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Emori, Tadashi, Shinoda, Koichi
Primary Examiner(s)
Ometz, David L.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/695,067
Time in Patent Office

1,763 Days
Field of Search

704/236, 704/238, 704/240, 704/241, 704/244, 704/250, 704/255
US Class Current

704/250
CPC Class Codes

G10L 15/12 using dynamic programming t...

Speaker's voice recognition system, method and recording medium using two dimensional frequency expansion coefficients

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

33 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker's voice recognition system, method and recording medium using two dimensional frequency expansion coefficients

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

33 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links