Voice registration method and system, and voice recognition method and system based on voice registration method and system

US 7,502,736 B2
Filed: 12/06/2001
Issued: 03/10/2009
Est. Priority Date: 08/09/2001
Status: Expired due to Term

First Claim

Patent Images

1. A voice registration method for voice recognition, comprising the steps of:

analyzing a spectrum of a sound signal inputted from the outside;

extracting predetermined language units for a speaker recognition from a voice signal in the sound signal;

measuring the loudness of each language unit;

collecting voice data on registered speakers including loudness data of the plurality of background speakers as a reference onto voice database;

determining whether the loudness of each language unit is within a predetermined loudness range based on the voice data base;

learning each language unit by using a multi-layer perceptron in the case that at least a predetermined number of language units are within the predetermined loudness range; and

storing data on the learned language unit as data for recognizing the speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a voice registration method for voice recognition, comprising the steps of analyzing a spectrum of a sound signal inputted from the outside; extracting predetermined language units for a speaker recognition from a voice signal in the sound signal; measuring the loudness of each language unit; collecting voice data on registered (background) speakers including loudness data of the plurality of background speakers as a reference onto voice database; determining whether the loudness of each language unit is within a predetermined loudness range based on the voice data base; learning each language unit by using a multi-layer perceptron in the case that at least a predetermined number of language units are within the predetermined loudness range; and storing data on the learned language unit as data for recognizing the speaker. With this configuration, loudness of a speaker is considered at learning for registering his/her voice and at verifying a speaker.

Citations

33 Claims

1. A voice registration method for voice recognition, comprising the steps of:
- analyzing a spectrum of a sound signal inputted from the outside;
  
  extracting predetermined language units for a speaker recognition from a voice signal in the sound signal;
  
  measuring the loudness of each language unit;
  
  collecting voice data on registered speakers including loudness data of the plurality of background speakers as a reference onto voice database;
  
  determining whether the loudness of each language unit is within a predetermined loudness range based on the voice data base;
  
  learning each language unit by using a multi-layer perceptron in the case that at least a predetermined number of language units are within the predetermined loudness range; and
  
  storing data on the learned language unit as data for recognizing the speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method according to claim 1, wherein the voice analyzing step includes the steps of:
    - representing the voice signal of the speaker as a spectrum; and
      
      compressing the spectrum by allocating filter banks to a speaker recognition region in which a voice characteristics of the speaker is to be recognized.
  - 3. The method according to claim 2, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 4. The method according to claim 2, wherein the speaker recognition region is 0˜
    - 3 KHz in which the filter banks are uniformly allocated, whereas over 3 KHz the intervals of the filter banks become logarithmically increased.
  - 5. The method according to claim 4, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 6. The method according to claim 4, further comprising the step of employing a plurality of phonemes selected from nasals, vowels, and approximants which include relatively lots of continuous sound as the language units,wherein the language unit extracting step includes the steps of making a plurality of frames by dividing the spectrum into several parts, and extracting a frame having the language unit among the frames.
  - 7. The method according to claim 6, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 8. The method according to claim 6, wherein the loudness measuring step is comprised of calculating an energy value of the frame having the language unit of the spectrum.
  - 9. The method according to claim 8, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 10. The method according to claim 8, further comprising the step of extracting maximum and minimum loudness by analyzing the voice spectrum of the background speakers stored in the voice database and by calculating the energy value of the frame having the language unit,wherein the loudness determining step is comprised of determining whether the number of the frames having the loudness within the maximum and minimum loudness occupies a predetermined rate or more.
  - 11. The method according to claim 10, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 12. The method according to claim 10, further comprising the steps of forming a plurality of reference patterns to every language unit of the plurality of background speakers, and forming a plurality of speaker patterns to every language unit of the plurality of speakers,wherein the learning step includes the step of learning a pattern characteristics of the speaker by comparing the reference patterns with the speaker patterns according to a back-propagation algorithm.
  - 13. The method according to claim 12, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 14. The method according to claim 12, further comprising the step of making learning groups as many as the number of language units of the background speakers by employing the plurality of reference patterns to every language unit of one background speaker as a learning group,wherein the learning step is comprised of learning the pattern characteristics of the speaker by comparing the reference patterns of every learning group with the plurality of the speaker patterns.
  - 15. The method according to claim 14, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 16. The method according to claim 1, wherein the storing step is comprised of storing the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 17. The method according to claim 1, further comprising the step of requesting the speaker to re-utter in the case that at least the predetermined number of language units are not within the predetermined loudness range.

18. A speaker recognition method for recognizing whether a speaker is a registered speaker, comprising the steps of:
- analyzing a spectrum of a sound signal inputted from the outside;
  
  extracting predetermined language units for a speaker recognition from a voice signal in the sound signal;
  
  measuring the loudness of each language unit;
  
  determining whether the loudness of each language unit is within a predetermined loudness range;
  
  calculating a speaker score by calculating the probability that the language unit will belong to the speaker through a multi-layer perceptron, and by averaging the probability, in the case that at least a predetermined number of language units are within the predetermined loudness range; and
  
  verifying that the speaker is registered when the speaker score is beyond a threshold value by comparing the calculated speaker score with the predetermined threshold value which is a predetermined minimum speaker score for verifying the registered speaker.
- View Dependent Claims (19, 20)
- - 19. The method according to claim 18, wherein the speaker score can be calculated from the following equation ${Score}_{speaker} = \frac{1}{M}$
    - ∑
      
      i = 0 M - 1 ⁢
      
      ⁢
      
      P ⁡
      
      ( LU i ) where P(LU_i) is a score of the probability that the enquiring speaker is the background speaker of an i^thlanguage unit frame, and M is the number of language unit frame extracted from an isolated word.
  - 20. The method according to claim 19, wherein the speaker score can be calculated on the basis of weight of the language units given according to verifiability.

21. A voice recognition system for voice recognition, comprising:
- a voice analyzer analyzing a spectrum of a sound signal inputted from the outside;
  
  a voice extractor extracting a voice signal from the sound signal and extracting predetermined language units for recognizing a speaker from the voice signal;
  
  a voice database storing therein background speaker voice data including the loudness of a plurality of reference background speakers;
  
  a loudness determiner determining the loudness of each language unit, and determining whether the loudness of each language unit is within a predetermined loudness range on the basis of the voice database;
  
  a learner learning the language unit in the case that at least a predetermined number of additional ones of the language units are within the predetermined loudness range;
  
  a memory storing data on the learned language units as recognition data for the speaker; and
  
  a controller controlling operations of the voice analyzer, the voice extractor, the loudness determiner and the learner when a voice is inputted, and storing the recognition data for the speaker in the memory.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The system according to claim 21, wherein the voice analyzer represents the voice signal of the speaker as a spectrum, and compresses the spectrum by allocating filter banks to a speaker recognition region in which the speaker is to be recognized, at a predetermined interval rate.
  - 23. The system according to claim 22, wherein the speaker recognition region is 0˜
    - 3 KHz in which the filter banks are uniformly allocated, whereas over 3 KHz the intervals of the filter banks become logarithmically increased.
  - 24. The system according to claim 23, wherein the voice extractor makes a plurality of frames by dividing the spectrum into several parts, and extracting a frame having phonemes selected from nasals, vowels, and approximants, which include relatively lots of continuous sound as the language units the language unit, among the plurality of frames.
  - 25. The system according to claim 24, wherein the loudness determiner calculates an energy value of the frame having the language unit of the spectrum.
  - 26. The system according to claim 25, wherein the loudness determiner previously determines maximum and minimum loudness by analyzing the voice spectrum of the background speakers stored in the voice database and by calculating the energy value of the frame having the language unit, and determines whether the number of the frame having the loudness within the maximum and minimum loudness is beyond a predetermined rate.
  - 27. The system according to claim 26, wherein the voice extractor forms a plurality of reference patterns corresponding to every language unit of the plurality of background speakers, and forms a plurality of speaker patterns to every language unit of the plurality of speakers;
    - makes a plurality of learning groups by employing the plurality of reference patterns to every language unit of one background speaker as one learning group.
  - 28. The system according to claim 27, wherein the learner learns a pattern property of the speaker by comparing the reference patterns with the speaker patterns according to a back-propagation algorithm.
  - 29. The system according to claim 28, wherein in the memory are stored the plurality of speaker patterns of every language unit and the loudness of every language unit as a speaker recognition data.
  - 30. The system according to claim 29, wherein the controller requests the speaker to re-utter in the case that at least the predetermined number more among all language units of the isolated word is within the predetermined loudness range.

31. A speaker recognition system for recognizing whether a speaker is a registered speaker, comprising:
- a voice analyzer analyzing a spectrum of a voice signal inputted from external sound signals;
  
  a voice extractor picking out voice signals among inputted sound and abstracting predetermined language units for recognizing the speaker from the voice signals;
  
  a loudness determiner determining the loudness of each language unit, and determining whether the loudness of each language unit is within a predetermined loudness range;
  
  a speaker score calculator calculating a speaker score by calculating probability of that the language unit will belong to the speaker, and by averaging the probability; and
  
  a controller controlling the speaker score calculator to calculate the speaker score in the case that at least the predetermined number more among all language units is within the predetermined loudness range, and ascertaining that the speaker has been registered when the speaker score is beyond a threshold value by comparing the calculated speaker score with the predetermined threshold value which is a predetermined minimum speaker score for ascertaining the registered speaker.
- View Dependent Claims (32, 33)
- - 32. The system according to claim 31, wherein the speaker score can be derived from ${Score}_{speaker} = \frac{1}{M}$
    - ∑
      
      i = 0 M - 1 ⁢
      
      ⁢
      
      P ⁡
      
      ( LU i ) Where P(LU_i) is a probability score of that the enquiring speaker is the background speaker of an i^thlanguage unit frame, and M is the number of language unit frame abstracted from the isolated word.
  - 33. The system according to claim 32, wherein the speaker score calculator calculates the speaker score on the basis of the language units according to discrimination.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Samsung Electronics Co. Ltd.
Inventors
Kim, Tae-soo, Lee, Tae-sung, Choi, Ho-jin, Lee, Sung-zoo, Hwang, Byoung-won, Hong, Sang-jin
Primary Examiner(s)
Lerner; Martin

Application Number

US10/486,258
Publication Number

US 20050033573A1
Time in Patent Office

2,651 Days
Field of Search

704/231, 704/232, 704/234, 704/243, 704/244, 704/246
US Class Current

704/232
CPC Class Codes

G10L 15/063 Training

Voice registration method and system, and voice recognition method and system based on voice registration method and system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Voice registration method and system, and voice recognition method and system based on voice registration method and system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links