Speech recognition system using neural networks

US 5,751,904 A
Filed: 04/30/1996
Issued: 05/12/1998
Est. Priority Date: 06/18/1992
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

voice recognizing and processing means including a plurality of speech recognition neural networks that have previously learned different voice patterns to recognize given voice data, each of said speech recognition neural networks including means for judging whether or not a piece of input voice data coincides with one of the voice data to be recognized and outputting a recognition result and having means for outputting adaptation judgment data independent of the recognition result, the adaptation judgement data representing the adaptation in speech recognition;

selector means receiving input voice data and data from said neural networks and responsive to the adaptation judgment data from each of said speech recognition neural networks for selecting one of said neural networks that has the highest adaptation in speech recognition; and

output control means for outputting the result of speech recognition from the speech recognition neural network selected by said selector means.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system can recognize a plurality of voice data having different patterns. The speech recognition system has a voice recognizing and processing device including a plurality of speech recognition neural networks that have previously learned different voice patterns to recognize given voice data. Each of the speech recognition neutral networks is adapted to judge whether or not input voice data coincides with one of the voice data to be recognized. Each neural network then outputs adaptation judgment data representing the adaptation in speech recognition. A selector responsive to the adaptation judgment data from each of the speech recognition neural networks selects one of the neural networks that has the highest adaptation in speech recognition. An output control device outputs the result of speech recognition from the speech recognition neural network selected by the selector.

Citations

26 Claims

1. A speech recognition system comprising:
- voice recognizing and processing means including a plurality of speech recognition neural networks that have previously learned different voice patterns to recognize given voice data, each of said speech recognition neural networks including means for judging whether or not a piece of input voice data coincides with one of the voice data to be recognized and outputting a recognition result and having means for outputting adaptation judgment data independent of the recognition result, the adaptation judgement data representing the adaptation in speech recognition;
  
  selector means receiving input voice data and data from said neural networks and responsive to the adaptation judgment data from each of said speech recognition neural networks for selecting one of said neural networks that has the highest adaptation in speech recognition; and
  
  output control means for outputting the result of speech recognition from the speech recognition neural network selected by said selector means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 2. A speech recognition system as defined in claim 1, further comprising feature extracting means for cutting the inputted voice data into each frame and transforming the inputted voice data into a feature vector, the transformed feature vectors being sequentially outputted from said feature extracting means, and wherein each of said speech recognition neural networks includes means for receiving the feature vectors from said feature extracting means as voice data.
  - 3. A speech recognition system as defined in claim 2, wherein each of said speech recognition neural networks comprises a plurality of neurons connected to one another in a predetermined manner and set at an internal state value X, each of said neurons being formed as a dynamic neuron, the internal value X varying according to time for satisfying a function X=G (X, Z_j) represented by the use of the internal state value X and input data Z_j (j=0, 1, 2, . . . , n where n is a natural number) provided to that neuron, each of said dynamic neurons having means for converting the internal state value X into a value which satisfies the function F(X) and means for outputting said converted value as an output signal.
  - 4. A speech recognition system as defined in claim 3, wherein each of the speech recognition neural networks comprises an input neuron for receiving the voice data, a recognition result output neuron for outputting the result of voice data recognition and an adaptation output neuron for outputting adaptation judgment data, said adaptation output neuron having means for inferring voice data to be inputted to said input neuron and means for outputting the inferred data as adaptation judgment data and wherein said selector means includes computing means for computing the adaptation of the inferred data relative to the actual voice data as adaptation in speech recognition.
  - 5. A speech recognition system as defined in claim 4 wherein said function X=G (X, Z_j) is represented by ##EQU10##
  - 6. A speech recognition system as defined in claim 4 wherein said function X=G (X, Z_j) is represented by ##EQU11## where W_ij is strength in joining the output of the j-th neuron to the input of the i-th neuron;
    - D_i is an external input value; and
      
      θ
      
      _i is a biasing value.
  - 7. A speech recognition system as defined in claim 4 wherein said function X=G (X, Z_j) is represented by ##EQU12## using the sigmoid function S.
  - 8. A speech recognition system as defined in claim 4 wherein said function X=G (X, Z_j) is represented by ##EQU13## where the sigmoid function S is used and where W_ij is strength in joining the output of the j-th neuron to the input of the i-th neuron;
    - D_i is an external input value; and
      
      θ
      
      _i is a biasing value.
  - 9. A speech recognition system as defined in claim 4 wherein said function F (X) used in each of said dynamic neurons is the sigmoid function.
  - 10. A speech recognition system as defined in claim 4 wherein said function F (X) used in each of said dynamic neurons is the threshold function.
  - 11. A speech recognition system as defined in claim 4 wherein said input data Z_j provided to each of said dynamic neurons includes feedback data obtained from the output of that neuron multiplied by a weight.
  - 12. A speech recognition system as defined in claim 4 wherein said input data Z_j provided to each of said dynamic neurons includes data obtained from the output of any other neuron multiplied by a weight.
  - 13. A speech recognition system as defined in claim 4 wherein said input data Z_j provided to each of said dynamic neurons includes externally provided data.
  - 14. A speech recognition system as defined in claim 3 wherein said function X=G (X, Z_j) is represented by ##EQU14##
  - 15. A speech recognition system as defined in claim 3 wherein said function X=G (X, Z_j) is represented by ##EQU15## where W_ij is strength in joining the output of the j-th neuron to the input of the i-th neuron;
    - D_i is an external input value; and
      
      θ
      
      _i is a biasing value.
  - 16. A speech recognition system as defined in claim 3 wherein said function X=G (X, Z_j) is represented by ##EQU16## using the sigmoid function S.
  - 17. A speech recognition system as defined in claim 3 wherein said function X=G (X, Z_j) is represented by ##EQU17## where the sigmoid function S is used and where W_ij is strength in joining the output of the j-th neuron to the input of the i-th neuron;
    - D_i is an external input value; and
      
      θ
      
      _i is a biasing value.
  - 18. A speech recognition system as defined in claim 1, wherein each of said speech recognition neural networks comprises a plurality of neurons connected to one another in a predetermined manner and set at an internal state value X, each of said neurons being formed as a dynamic neuron, the internal value X varying according to time for satisfying a function X=G (X, Z_j) represented by the use of the internal state value X and input data Z_j (j=0, 1, 2, . . . , n where n is a natural number) provided to that neuron, each of said dynamic neurons including means for converting the internal state value X into a value which satisfies the function F(X) and means for outputting said converted value as an output signal.
  - 19. A speech recognition system as defined in claim 18 wherein said function X=G (X, Z_j) is represented by ##EQU18##
  - 20. A speech recognition system as defined in claim 18 wherein said function X=G (X, Z_j) is represented by ##EQU19## where W_ij is strength in joining the output of the j-th neuron to the input of the i-th neuron;
    - D_i is an external input value; and
      
      θ
      
      _i is a biasing value.
  - 21. A speech recognition system as defined in claim 18 wherein said function X=G (X, Z_j) is represented by ##EQU20## using the sigmoid function S.
  - 22. A speech recognition system as defined in claim 18 wherein said function X=G (X, Z_j) is represented by ##EQU21## where the sigmoid function S is used and where W_ij is strength in joining the output of the j-th neuron to the input of the i-th neuron;
    - D_i is an external input value; and
      
      θ
      
      _i is a biasing value.
  - 23. A speech recognition system as defined in claim 1, further comprising an internal state value setting section including means for receiving data from said output control means.

24. A speech recognition system comprising:
- feature extracting means for cutting and convert input voice data into a feature vector for each frame, said feature vectors being sequentially outputted from said feature extracting means;
  
  voice recognizing and processing means including a plurality of speech recognition neural networks each having learned to infer a feature vector of a speaker based on a feature vector of a speaker inputted from said feature extracting means into that speech recognition neural network for outputting that inferred vector as adaptation judgement data representing the adaption in the speech recognition, said each speech recognition neural network being formed to output said adaptation judgement data based on a feature vector actually inputted from said feature extracting means; and
  
  speaker recognizing means for computing the rate of coincidence between the adaptation judgment data from each of said speech recognition neural network means and the feature vector of the speaker actually inputted from said feature extracting means into said each speech recognition neural network to recognize the speaker of the inputted voice for each of said speech recognition neural network.
- View Dependent Claims (25, 26)
- - 25. A speech recognition system as defined in claim 24, wherein each of said speech recognition neural networks comprises a plurality of neurons connected to one another and set at an internal state value X, each of said neurons being formed as a dynamic neuron, the internal value X varying according to time for satisfying a function X=G (X, Z_j) represented by the use of the internal state value X and input data Z_j (j=0, 1, 2, . . . , n where n is a natural number) provided to that neuron, each of said dynamic neurons including means for converting the internal state value X into a value which satisfies the function F(X) and means for outputting said converted value as an output signal.
  - 26. A speech recognition system as defined in claim 25, wherein each of said speech recognition neural networks comprises an input neuron for receiving said feature vector and an adaptation output neuron for outputting said adaptation judgment data, said adaptation output neuron being having means for inferring said feature vector inputted thereinto and means for outputting the inferred data as an adaptation judgment data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Seiko Epson Corporation (Seiko Group)
Original Assignee
Seiko Epson Corporation (Seiko Group)
Inventors
Inazumi, Mitsuhiro
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Dorvil, Richemond

Application Number

US08/641,268
Time in Patent Office

742 Days
Field of Search

395/2, 395/2.11, 395/2.41, 395/2.68, 395/2.65, 395/2.69, 395/22, 395/23, 395/24, 395/2.09
US Class Current

704/232
CPC Class Codes

G06N 3/045 Combinations of networks

G10L 15/16 using artificial neural net...

Speech recognition system using neural networks

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system using neural networks

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links