Voice recognition device and method using a (GGM) Guaranteed Global minimum Mapping

US 5,764,853 A
Filed: 10/25/1995
Issued: 06/09/1998
Est. Priority Date: 10/27/1994
Status: Expired due to Fees

First Claim

Patent Images

1. A voice recognition device comprising:

analyzing means for acoustically analyzing voice every predetermined frame unit to extract a feature vector X;

converting means for subjecting the feature vector X output from said analyzing means to a predetermined conversion process; and

recognition means for recognizing the voice on the basis of a new feature vector output from said conversion means, wherein said conversion means conducts the predetermined conversion processing according to a mapping F from an N-dimensional vector space Ω

_N to an M-dimensional vector space Ω

_M, the feature vector X is a vector on the N-dimensional vector space Ω

_N, and the function f_m (X) of an mth component of the mapping F is represented by the following linear summation of the products of complete component functions g_m^k (X) of L_m determined on the basis of the distribution of the learning sample S_q (=(S₀^q, S₁^q, S₂^q, . . . , S_N-1^q)) on the N-dimensional measurable vector space which is classified into categories C^q (q=0, 1, 2, . . . , Q-1) of Q, and coefficients c_m^k of L_m ;

##EQU12## wherein when teacher vectors T_q (=(t₀^q, t₁^q, t₂^q, . . . , t_M-1^q)) on an M-dimensional measurable vector space Ω

_M for the categories C_q of Q are provided and a predetermined estimation function J is calculated, the coefficient c_m^k is determined so as to minimize the estimation function J.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice recognition device according to the present invention including a voice analyzer for acoustically analyzing voice every predetermined frame unit to extract a feature vector X, a converter for subjecting the feature vector X output from the analyzer to a predetermined conversion process, and a voice recognizer for recognizing the voice on the basis of a new feature vector output from the converter, wherein the converter conducts the predetermined conversion processing according to a mapping F from an N-dimensional vector space Ω_N to an M-dimensional vector space Ω_M, the feature vector X is a vector on the N-dimensional vector space Ω_N and the function f_m (X) of an m-th component of the mapping F is represented by the following linear summation of the products of functions g_m^k (X) and coefficients c_m^k of L_m : ##EQU1## Each function g_m^k (X) may be set to a monomial.

28 Citations

View as Search Results

26 Claims

1. A voice recognition device comprising:
- analyzing means for acoustically analyzing voice every predetermined frame unit to extract a feature vector X;
  
  converting means for subjecting the feature vector X output from said analyzing means to a predetermined conversion process; and
  
  recognition means for recognizing the voice on the basis of a new feature vector output from said conversion means, wherein said conversion means conducts the predetermined conversion processing according to a mapping F from an N-dimensional vector space Ω
  
  _N to an M-dimensional vector space Ω
  
  _M, the feature vector X is a vector on the N-dimensional vector space Ω
  
  _N, and the function f_m (X) of an mth component of the mapping F is represented by the following linear summation of the products of complete component functions g_m^k (X) of L_m determined on the basis of the distribution of the learning sample S_q (=(S₀^q, S₁^q, S₂^q, . . . , S_N-1^q)) on the N-dimensional measurable vector space which is classified into categories C^q (q=0, 1, 2, . . . , Q-1) of Q, and coefficients c_m^k of L_m ;
  
  ##EQU12## wherein when teacher vectors T_q (=(t₀^q, t₁^q, t₂^q, . . . , t_M-1^q)) on an M-dimensional measurable vector space Ω
  
  _M for the categories C_q of Q are provided and a predetermined estimation function J is calculated, the coefficient c_m^k is determined so as to minimize the estimation function J.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The voice recognition device as claimed in claim 1, wherein when a calculation of an expected value of the function f_m (X) over all the elements of the learning sample S_q is represented by E{Xε
    - S_q }{f(X)}, the estimation function J is represented as follows;
      
      ##EQU13## the categories C_q of Q correspond to Q types of phonemes, and the learning sample S_q correspond to labelled voice data.
  - 3. The voice recognition device as claimed in claim 2, wherein each function g_m^k (X) is set to a monomial.
  - 4. The voice recognition device as claimed in claim 2, wherein the dimension M of the vector space after the conversion by the mapping F is equal to the total number Q of the categories C_q corresponding to the phonemes.
  - 5. The voice recognition device as claimed in claim 2, wherein each of the teacher vectors T_q is an unit vector in the M-dimensional vector space, and the teacher vectors T_q of Q are orthogonal to each other.
  - 6. The voice recognition device as claimed in claim 2, further including bundling means for bundling feature vectors X of plural frames, each of which is output every predetermined frame unit from said analyzing means, and then supplying bundled feature vectors as a feature vector X to said conversion means.
  - 7. The voice recognition device as claimed in claim 2, wherein the feature vector X comprises LPC spectrum.
  - 8. The voice recognition device as claimed in claim 2, wherein the feature vector X comprises power every predetermined band width of voice.
  - 9. The voice recognition device as claimed in claim 2, wherein said recognition means recognizes voice on the basis of both the new feature vector output from said converting means and said feature vector X output from said analyzing means.
  - 10. The voice recognition device as claimed in claim 2, wherein said analyzing means acoustically analyzes the voice and extracts the feature vector X and another feature vector which is different from the feature vector X, and said recognizing means recognizes the voice on the basis of both the new feature vector output from said conversion means and the other feature vector output from said analyzing means.
  - 11. The voice recognition device as claimed in claim 10, wherein the other feature vector is set to the difference between respective feature vectors X extracted from two frames which are spaced away from each other by a predetermined frame number.
  - 12. The voice recognition device as claimed in claim 2, wherein said recognition means recognizes the voice according to an HMM (Hidden Markov Models) method.
  - 13. The voice recognition device as claimed in claim 12, further including vector quantization means for vector-quantizing a vector supplied to said recognition means to output a predetermined code, wherein said recognition means recognizes the voice on the basis of the predetermined code output from said vector quantization means according to the discrete HMM method.

14. A voice recognition method comprising:
- a voice analyzing step for acoustically analyzing voice every predetermined frame unit to extract a feature vector X;
  
  a vector conversion step for subjecting the feature vector X extracted in said analyzing step to a predetermined conversion process; and
  
  a voice recognition step for recognizing the voice on the basis of the new feature vector output in said vector conversion step, wherein the predetermined conversion processing is conducted according to a mapping F from an Ndimensional vector space Ω
  
  _N to an M-dimensional vector space Ω
  
  _M in said vector conversion step, the feature vector X is a vector on the N-dimensional vector space Ω
  
  _N, and the function f_m (X) of an m-th component of the mapping F is represented by the following linear summation of the products of complete component functions g_m^k (X) of L_m determined on the basis of the distribution of the learning sample S_q (=(S₀^q, S₁^q, S₂^q, . . . , S_N-1^q)) on the N-dimensional measurable vector space which is classified into categories C_q (q=0, 1, 2, . . . , Q-1) of Q, and coefficients c_m^k of L_m ;
  
  ##EQU14## wherein when teacher vectors T_q (=(t₀^q, t₁^q, t₂^q, . . . , t_M-1^q)) on an M-dimensional measurable vector space gm for the categories C_q of Q are provided and a predetermined estimation function J is calculated, the coefficient C_m^k is determined so as to minimize the estimation function J.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The voice recognition method as claimed in claim 14, wherein when a calculation of an expected value of the function f_m (X) over all the elements of the learning sample S_q is represented by E{Xε
    - S_q }{f(X)}, the estimation function J is represented as follows;
      
      ##EQU15## the categories C_q of Q correspond to Q types of phonemes, and the learning sample S_q correspond to labelled voice data.
  - 16. The voice recognition method as claimed in claim 15, wherein each function g_m^k (X) is set to a monomial.
  - 17. The voice recognition method as claimed in claim 15, wherein the dimension M of the vector space after the conversion by the mapping F is equal to the total number Q of the categories C_q corresponding to the phonemes.
  - 18. The voice recognition method as claimed in claim 15, wherein each of the teacher vectors T_q is an unit vector in the M-dimensional vector space, and the teacher vectors T_q of Q are orthogonal to each other.
  - 19. The voice recognition method as claimed in claim 15, further comprising a bundling step for bundling feature vectors X of plural frames, each of which is output every predetermined frame unit in said voice analyzing step, and then supplying bundled feature vectors as a feature vector X to said conversion means.
  - 20. The voice recognition method as claimed in claim 15, wherein the feature vector X comprises LPC spectrum.
  - 21. The voice recognition method as claimed in claim 15, wherein the feature vector X comprises power every predetermined band width of voice.
  - 22. The voice recognition method as claimed in claim 15, wherein in said voice recognition step, voice is recognized on the basis of both the new feature vector output from said conversion step and said feature vector X output from said voice analyzing step.
  - 23. The voice recognition method as claimed in claim 15, wherein in said voice analyzing step, the voice is acoustically analyzed to extract the feature vector X and another feature vector which is different from the feature vector X, and in said voice recognizing step the voice is recognized on the basis of both the new feature vector output from said conversion step and the other feature vector output from said voice analyzing step.
  - 24. The voice recognition method as claimed in claim 23, wherein the other feature vector is set to the difference between respective feature vectors X extracted from two frames which are spaced away from each other by a predetermined frame number.
  - 25. The voice recognition method as claimed in claim 15, wherein in said recognition the voice is recognized according to an HMM (Hidden Markov Models) method.
  - 26. The voice recognition method as claimed in claim 25, wherein said voice recognizing step includes a vector quantization step for vector-quantizing the supplied vector to output a predetermined code, the voice being recognized on the basis of the predetermined code output from said vector quantization step according to the discrete HMM method.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Minamino, Katsuki, Watanabe, Kazuo, Omote, Masanori, Ogawa, Hiroaki, Ishii, Kazuo, Kato, Yasuhiko, Watari, Masao
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/548,278
Time in Patent Office

958 Days
Field of Search

395/2.52, 395/2.53, 395/2.54, 395/2.55, 395/2.65, 395/21-24, 395/2.41-2.42
US Class Current

704/243
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/20   Speech recognition techniqu...

G10L 2015/025   Phonemes, fenemes or fenone...

Voice recognition device and method using a (GGM) Guaranteed Global minimum Mapping

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

28 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Voice recognition device and method using a (GGM) Guaranteed Global minimum Mapping

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

28 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links