Speaker recognition system using neural network

US 5,461,697 A
Filed: 11/12/1993
Issued: 10/24/1995
Est. Priority Date: 11/17/1988
Status: Expired due to Fees

First Claim

Patent Images

1. A speaker recognition system for identifying a speaker and verifying registration of a speaker based on an input voice, comprising:

a voice input section for inputting a voice;

a preprocessing section for extracting a feature quantity from said inputted voice and averaging said extracted feature quantity timewise;

a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and

a speaker judgment section for judging identification or registration of said speaker based on an output from said neural network;

whereinsaid preprocessing section divides said inputted voice into a plurality of frames that are timewise equal to one another, extracts a feature quantity for each frame, and averages together said feature quantity of each frame over a group of frames, thereby producing an input pattern for said layered neural network consisting of an extracted feature quantity which is timewise averaged.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker recognition system for recognizing a speaker from an input voice using a neural network, in which a feature quantity extracted from the input voice is timewise averaged to create an input pattern to the neural network. The averaging technique is such that the input voice is equally divided timewise into a plurality of blocks in a simple manner and that such feature quantity is averaged every block. The feature quantity includes a frequency characteristic, pitch frequency, linear prediction coefficient, and partial self-correlation (PARCOR) coefficient of the voice.

Citations

8 Claims

1. A speaker recognition system for identifying a speaker and verifying registration of a speaker based on an input voice, comprising:
- a voice input section for inputting a voice;
  
  a preprocessing section for extracting a feature quantity from said inputted voice and averaging said extracted feature quantity timewise;
  
  a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and
  
  a speaker judgment section for judging identification or registration of said speaker based on an output from said neural network;
  
  whereinsaid preprocessing section divides said inputted voice into a plurality of frames that are timewise equal to one another, extracts a feature quantity for each frame, and averages together said feature quantity of each frame over a group of frames, thereby producing an input pattern for said layered neural network consisting of an extracted feature quantity which is timewise averaged.
- View Dependent Claims (2, 3)
- - 2. A speaker recognition system according to claim 1, wherein said feature quantity of said inputted voice includes a frequency characteristic of said voice, a pitch frequency of said voice, a frequency characteristic of said voice whose high frequencies are subjected to emphasis, a linear prediction coefficient of said voice, and a PARCOR (partial self-correlation) coefficient of said voice.
  - 3. A speaker recognition system according to claim 1, said system further comprising:
    - a function selection section for selecting a speaker identification function or a speaker verification function or both;
      
      a registered speaker count setting section for setting the number of speakers to be preregistered in said system; and
      
      a mode selection section for selecting a mode from a mode in which learning is executed by said neural network or a mode in which speaker recognition is activated using said neural network that has completed said learning.

4. A speaker recognition system using a neural network comprising:
- a function selection section for selecting a speaker identification function, a speaker verification function, or both;
  
  a registered speaker count setting section for setting the number of speakers to be preregistered in said system;
  
  a mode selection section for selecting a mode in which learning is executed by the neural network, or a mode in which speaker recognition is activated using said neural network that has completed said learning;
  
  voice input section for inputting a voice and detecting a voice block from said voice;
  
  a preprocessing section for extracting a feature quantity from said inputted voice by dividing said inputted voice into a plurality of frames that are timewise equal to one another, extracting a feature quantity from each frame, and averaging said extracted feature quantity for each frame over a group of frames, thereby producing an input pattern;
  
  a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section;
  
  a speaker judgment section for judging identity or registration of said speaker based on an output from said neural network;
  
  whereinin said learning mode,in response to a voice being input, together with speaker information identifying a speaker or indicating whether said speaker is a registered speaker, and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks consisting of at least one frame, calculates spectral power of n frequency bands, said n frequency bands being set on the frequency domain for each frame, averages together the spectral power of each frame in a subblock for every equally divided subblock, generates an input pattern obtained from the result of averaging for input to said neural network, calculates an error between said obtained output pattern and a target value corresponding to said speaker information, determines a degree of strength of connection between units of the neural network for correction so that said error is decreased, and repeats said calculation of said error and said correction of said degree of strength of connection between said units until said error becomes below a predetermined value; and
  
  in said activation mode,in response to a voice being inputted and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks, calculates spectral power for each of n frequency bands, said n frequency bands being set on the frequency domain for each frame, averages together said spectral power of each frame in a subblock for every equally divided subblock, and transmits an input pattern obtained from the result of averaging to said neural network that has completed said learning; and
  
  said judgment section judges the identity of the speaker from said obtained output pattern in an identification function and the registration of the speaker in a verification function wherein m and n are integers.

5. A speaker recognition system for identifying a speaker and verifying registration of a speaker based on an input voice, comprising:
- a voice input section for inputting a voice;
  
  a preprocessing section for extracting a feature quantity from each of a plurality of frames which said input voice is divided into with a predetermined period;
  
  a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and
  
  a speaker judgement section for judging identification or registration of said speaker based on an output from said neural network;
  
  wherein said preprocessing section includes means for producing said input pattern in the manner that said plurality of frames are grouped into a plurality of groups, a total number of the groups is less than a total number of the frames, each of the groups includes a substantially equal number of frames, and said feature quantities in each of the groups are averaged to produce said input pattern of the neural network.
- View Dependent Claims (6, 7)
- - 6. A speaker recognition system according to claim 5, wherein said feature quantity of said input voice is one of a frequency characteristic of said voice, a pitch frequency of said voice, a frequency characteristic of said voice whose high frequencies are subjected to emphasis, a linear prediction coefficient of said voice, and a PARCOR (partial self-correlation) coefficient of said voice.
  - 7. A speaker recognition system according to claim 5, wherein said system further comprises:
    - a function selection section for selecting at least one of a speaker identification function and a speaker verification function;
      
      a registered speaker count setting section for setting the number of speakers to be preregistered in said system; and
      
      a mode selection section for selecting a mode in which learning is executed by said neural network or a mode in which speaker recognition is activated using said neural network that has completed said learning.

8. A speaker recognition system using a neural network comprising:
- a function selection section for selecting at least one of a speaker identification function and a speaker verification function;
  
  a registered speaker count setting section for setting the number of speakers to be preregistered in said system;
  
  a mode selection section for selecting a mode in which learning is executed by the neural network, or a mode in which speaker recognition is activated using said neural network that has completed said learning;
  
  a voice input section for inputting a voice and detecting a voice block from said voice;
  
  a preprocessing section for extracting a feature quantity from each of a plurality of frames which said input voice is divided into with a predetermined period;
  
  a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and
  
  a speaker judgement section for judging an identity or a registration of said speaker based on an output from said neural network;
  
  wherein said preprocessing section includes means for producing said input pattern in the manner that said plurality of frames are grouped into a plurality of groups, a total number of the groups is less than a total number of the frames, each of the groups includes a substantially equal number of frames, and said feature quantities in each of the groups are averaged to produce said input pattern of the neural network; and
  
  wherein, in said learning mode,in response to a voice being input, together with speaker information identifying a speaker or indicating whether said speaker is a registered speaker, and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks including at least one frame, calculates spectral power of n frequency bands, said frequency bands being set on the frequency domain for each frame, averages together the spectral power of each frame in a subblock for every equally divided subblock, generates an input pattern obtained from the result of averaging for input to said neural network, calculates an error between said obtained output pattern and a target value corresponding to said speaker information, determines a degree of strength of connection between units of the neural network for correction so that said error is decreased, and repeats said calculation of said error and said correction of said degree of strength of connection between said units until said error becomes below a predetermined value; and
  
  in said activation mode,in response to a voice being inputted and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks, calculates spectral power for each of n frequency bands, said n frequency bands being set on the frequency domain for each frame, averages together said spectral power of each frame in a subblock for every equally divided subblock, and transmits an input pattern obtained from the result of averaging to said neural network that has completed said learning; and
  
  said judgement section judges the identity of the speaker from said obtained output pattern in an identification function mode and the registration of the speaker in a verification function mode wherein m and n are integers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sekisui Kagaku Kogyo Kabushiki Kaisha (Sekisui)
Original Assignee
Sekisui Kagaku Kogyo Kabushiki Kaisha (Sekisui)
Inventors
Miyakawa, Masashi, Nishimura, Shingo, Umino, Masayuki, Nonaka, Shigenobu
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US08/150,785
Time in Patent Office

711 Days
Field of Search

395/2.11, 395/2.41, 395/2.55, 395/2.6
US Class Current

704/232
CPC Class Codes

G10L 15/16 using artificial neural net...

G10L 17/18 Artificial neural networks;...

Speaker recognition system using neural network

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker recognition system using neural network

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links