Speaker verification system

US 5,121,428 A
Filed: 11/08/1990
Issued: 06/09/1992
Est. Priority Date: 01/20/1988
Status: Expired due to Fees

First Claim

Patent Images

1. A speaker verification system, comprising:

a) conversion means for;

1) dividing an input speech signal into frames at predetermined time intervals; and

2) converting the input speech signal into an acoustic parameter with a frequency spectrum having a plurality of frequency channels for every frame, thus generating a time-series of spectral patterns;

b) detecting means for detecting, from the time-series of spectral patterns, a speech portion of the input speech signal;

c) primary moment generating means for generating a primary moment of the frequency spectrum for every frame, the primary moment showing a channel position corresponding to a center of the frequency spectrum;

d) segmentation means for segmenting the speech portion into a plurality of blocks, based on the primary moment generated for every frame;

e) feature extracting means for extracting features of the input speech signal for every segmented block;

f) memory means for storing reference features of registered speakers, the reference features including features of input speech signals of the registered speakers extracted by the feature extracting means;

g) distance calculating means for calculating a distance between (1) the extracted features of an unknown speaker, and (2) the reference features stored in the memory means; and

h) decision means for making a decision as to whether or not the unknown speaker is a real speaker by comparing the distance calculated by the distance calculating means with a predetermined threshold value.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a speaker verification system, a detecting part detects a speech section of an input speech signal by using a time-series acoustic parameters thereof. A segmentation part calculates individuality information for segmentation by using the time-series acoustic parameters within the speech section, and segments the input speech section into a plurality of blocks based on the individuality information. A feature extracting part extracts features of an unknown speaker for every segmented block by using the time-series acoustic parameters. A distance calculating part calculates a distance between the features of the speaker extracted by the feature extracting part and reference features stored in a memory. A decision part makes a decision as to whether or not the unknown speaker is a real speaker by comparing the calculated distance with a predetermined threshold value. Segmentation is made by calculating a primary moment of the spectrum, over a block, and finding successive values which satisfy a predetermined criterion.

90 Citations

View as Search Results

12 Claims

1. A speaker verification system, comprising:
- a) conversion means for;
  
  1) dividing an input speech signal into frames at predetermined time intervals; and
  
  2) converting the input speech signal into an acoustic parameter with a frequency spectrum having a plurality of frequency channels for every frame, thus generating a time-series of spectral patterns;
  
  b) detecting means for detecting, from the time-series of spectral patterns, a speech portion of the input speech signal;
  
  c) primary moment generating means for generating a primary moment of the frequency spectrum for every frame, the primary moment showing a channel position corresponding to a center of the frequency spectrum;
  
  d) segmentation means for segmenting the speech portion into a plurality of blocks, based on the primary moment generated for every frame;
  
  e) feature extracting means for extracting features of the input speech signal for every segmented block;
  
  f) memory means for storing reference features of registered speakers, the reference features including features of input speech signals of the registered speakers extracted by the feature extracting means;
  
  g) distance calculating means for calculating a distance between (1) the extracted features of an unknown speaker, and (2) the reference features stored in the memory means; and
  
  h) decision means for making a decision as to whether or not the unknown speaker is a real speaker by comparing the distance calculated by the distance calculating means with a predetermined threshold value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The speaker verification system of claim 1, wherein:
    - the primary moment generated by the primary moment generating means includes a weighted component which emphasizes the spectrum in a predetermined frequency range of the frequency spectrum.
  - 3. The speaker verification system of claim 1, wherein:
    - the speaker verification system further comprises reference block setting means for generating reference blocks based on the primary moment of the input speech signal obtained when the reference features of one of the registered speakers are registered in the memory means; and
      
      the segmentation means includes means to segment the speech portion of the input speech signal of the unknown speaker into the plurality of blocks in accordance with positions of the reference blocks.
  - 4. The speaker verification system of claim 1, wherein:
    - the speaker verification system further comprises arithmetic average calculating means for calculating a time-base arithmetic average of primary moments of frame in each of an umber of reference blocks, the reference blocks being defined as the plurality of blocks of the input speech signal obtained when the reference features of one of the registered speakers are registered in the memory means; and
      
      the segmentation means includes means for segmenting the speech portion of the input speech signal of the unknown speaker into the plurality of blocks in accordance with a duration timer length control type transition model defining a relationship between (1) the time-base arithmetic average of the primary moments obtained for every reference block, and (2) the primary moments in the speech portion of the input speech signal of the unknown speaker.
  - 5. The speaker verification system of claim 1, wherein:
    - the segmentation means includes means for segmenting into blocks, speech portions of input speech signals of one of the registered speakers which were uttered a predetermined number of times, the segmenting carried out when the reference features of the one of the registered speakers are registered in the memory means, thus obtaining a plurality of blocks for each of the input speech signals;
      
      the speaker verification system further comprises reference block setting means for specifying reference blocks which are a typical number of blocks out of the plurality of blocks obtained for each of the input speech signals; and
      
      the segmentation means includes means for segmenting the speak section of the unknown speaker into a number of blocks equal to the typical number of blocks specified by the reference block setting means.
  - 6. The speaker verification system of claim 1, wherein:
    - the segmentation means includes means to segment speech portions of each of a predetermined number of input speech signals of one of the registered speakers when the reference features of the one of the registered speakers are registered in the memory means, thus obtaining a plurality of blocks for each of the predetermined number of input speech signals;
      
      the speaker verification system further comprises;
      
      a) reference block setting means for specifying reference blocks which are a typical number of blocks out of the plurality of blocks obtained for each of the input speech signals; and
      
      b) arithmetic average calculating means for calculating a time-base arithmetic average of primary moments of frames in each of the reference blocks; and
      
      the segmentation means includes means for segmenting the speech portion of the input speech signal of the unknown speaker into the plurality of blocks in accordance with a duration time length control type transition model defining a relationship between (1) the time-base arithmetic average of the primary moments obtained for every reference block, and (2) the primary moments in the speech portion of the input speech signal of the unknown speaker.
  - 7. The speaker verification system of claim 1, wherein:
    - the segmentation means includes means to segment speech portions of each of a predetermined number of input speech signals of one of the registered speakers when the reference features of the one of the registered speakers are registered in the memory means, thus obtaining a plurality of blocks for each of the predetermined number of input speech signals;
      
      the speaker verification system further comprises;
      
      a) reference block setting means for specifying reference blocks from among the blocks of the predetermined number of input speech signals in accordance with a predetermined criterion; and
      
      b) re-segmentation means for re-segmenting the speech portion of each of the predetermined number of input speech signals into a plurality of re-segmented blocks in accordance with the reference blocks, thus obtaining reference features of the one of the registered speakers for the re-segmented blocks of each of the predetermined number of input speech signals;
      
      the segmentation means includes means to segment the speech portion of the unknown speaker into the plurality of blocks on the basis of the reference blocks; and
      
      the distance calculating means includes means for calculating the distance between (1) the extracted features of the unknown speaker obtained for every reference block, and (2) the reference features of the one of the registered speakers which are obtained for the re-segmented blocks of each of the predetermined number of input speech signals.
  - 8. The speaker verification system of claim 7, wherein:
    - the reference blocks setting means includes means to specify, of the reference blocks, a minimum number of blocks included in the speech portion of one of the predetermined number of input speech signals.
  - 9. The speaker verification system of claim 7, wherein:
    - the speaker verification system further comprises arithmetic average calculating means for calculating a time-base arithmetic average of primary moments of frames in each of the reference blocks; and
      
      the segmentation means includes means for segmenting the speech portion of the input speech signal of the unknown speaker into the plurality of blocks in accordance with a duration time length control type transition model defining a relationship between (1) the time-base arithmetic average of the primary moments obtained for every reference block, and (2) the primary moments in the speech portion of the input speech signal of the unknown speaker.
  - 10. The speaker verification system of claim 9, further comprising:
    - means for describing the features, which the feature extracting means extracts for every block, by using an average spectrum obtained by calculating an arithmetic average of the spectrum of each frame within each block, and a gradient of a least square fit line with respect to the average spectrum within each frame.
  - 11. The speaker verification system of claim 9, further comprising:
    - means for describing the features which the feature extracting means extracts for every block, as spectral data obtained by;
      
      1) calculating an arithmetic average of the spectrum of each frame over each block;
      
      2) calculating a gradient of a least square fit line with respect to the average spectrum within each frame; and
      
      3) subtracting the arithmetic average obtained for every block from the gradient obtained for every block.
  - 12. The speaker verification system of claim 9, further comprising:
    - means for describing the features which the feature extracting means extracts for every block, as spectral data obtained by calculating a least square fit line with respect to a general shape of the spectrum of each frame, and subtracting the least square fit line from the general shape.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ricoh Company Limited
Original Assignee
Ricoh Company Limited
Inventors
Yamazaki, Nobuhide, Uchiyama, Hiroki, Kitagawa, Hiroo
Primary Examiner(s)
KEMENY, EMANUEL

Application Number

US07/610,317
Time in Patent Office

579 Days
Field of Search

381/42, 381/43, 395/2
US Class Current

704/243
CPC Class Codes

G10L 17/02 Preprocessing operations, e...

Speaker verification system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

90 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker verification system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

90 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links