Orthogonalized dictionary speech recognition apparatus and method thereof

US 4,979,213 A
Filed: 07/12/1989
Issued: 12/18/1990
Est. Priority Date: 07/15/1988
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system, comprising:

acoustic analyzing means for converting input speech into an electrical signal and obtaining speech pattern data upon acoustic analysis of said electrical signal;

means for detecting a speech interval of the electrical signal;

means for generating sampling pattern data by extracting a predetermined number of samples from speech pattern data included in the detected speech interval;

means for prestoring sampling pattern data of a plurality of speakers for categories of speech to be recognized, said sampling pattern data including learning pattern data;

means for forming orthogonalized dictionary data for each speaker on the basis of the sampling pattern data, said forming means forming averaged pattern data of a plurality of sampling pattern data obtained from each speaker;

means for forming dictionary data of a first axis by smoothing the averaged pattern data in a time base direction;

means for forming dictionary data of a second axis orthogonal to the first axis by differentiating the averaged pattern data in the time base direction;

an orthogonalized dictionary for storing the dictionary data of the first and second axes as orthogonal dictionary data;

means for forming additional orthogonal dictionary data representing feature variations in speech of each speaker and orthogonal to the orthogonal dictionary data stored in said orthogonalized dictionary in accordance with sampling pattern data of each of a second and subsequent of said plurality of speakers on the basis of the orthogonal dictionary data obtained with respect to a first of said plurality of speakers;

means for selectively storing the additional orthogonal dictionary data in said orthogonalized dictionary;

means for computing a similarity value between the orthogonal dictionary data stored in said orthogonalized dictionary and the sampling pattern data formed by said sampling pattern data generating means; and

means for recognizing input speech on the basis of the similarity value.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech pattern data representing speech of a plurality of speakers are stored in a pattern storage section in advance. Averaged pattern data obtained by averaging a plurality of speech pattern data of the first of the plurality of speakers are obtained. Data obtained by blurring and differentiating the averaged pattern data are stored in an orthogonalized dictionary as basic orthogonalized dictionary data of first and second axes, respectively. Blurred data and differentiated data obtained with respect to the second and subsequent of the plurality of speakers are selectively stored in the orthogonalized dictionary as additional dictionary data having new axes. Speech of the plurality of speakers is recognized by computing a similarity between the orthogonalized dictionary formed in this manner and input speech.

20 Citations

View as Search Results

24 Claims

1. A speech recognition system, comprising:
- acoustic analyzing means for converting input speech into an electrical signal and obtaining speech pattern data upon acoustic analysis of said electrical signal;
  
  means for detecting a speech interval of the electrical signal;
  
  means for generating sampling pattern data by extracting a predetermined number of samples from speech pattern data included in the detected speech interval;
  
  means for prestoring sampling pattern data of a plurality of speakers for categories of speech to be recognized, said sampling pattern data including learning pattern data;
  
  means for forming orthogonalized dictionary data for each speaker on the basis of the sampling pattern data, said forming means forming averaged pattern data of a plurality of sampling pattern data obtained from each speaker;
  
  means for forming dictionary data of a first axis by smoothing the averaged pattern data in a time base direction;
  
  means for forming dictionary data of a second axis orthogonal to the first axis by differentiating the averaged pattern data in the time base direction;
  
  an orthogonalized dictionary for storing the dictionary data of the first and second axes as orthogonal dictionary data;
  
  means for forming additional orthogonal dictionary data representing feature variations in speech of each speaker and orthogonal to the orthogonal dictionary data stored in said orthogonalized dictionary in accordance with sampling pattern data of each of a second and subsequent of said plurality of speakers on the basis of the orthogonal dictionary data obtained with respect to a first of said plurality of speakers;
  
  means for selectively storing the additional orthogonal dictionary data in said orthogonalized dictionary;
  
  means for computing a similarity value between the orthogonal dictionary data stored in said orthogonalized dictionary and the sampling pattern data formed by said sampling pattern data generating means; and
  
  means for recognizing input speech on the basis of the similarity value.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A system according to claim 1, wherein said averaged pattern data forming means includes means for computing averaged pattern data A.sub.(j,k) from learning pattern data a_m(j,k) for one of said categories i in accordance with an equation:
    - ##EQU5## where j and k are positive integers.
  - 3. A system according to claim 2, wherein said first axis dictionary data forming means includes means for computing blurred pattern data by performing a blurring (smoothing) operation using the averaged pattern data A.sub.(j,k) in accordance with an equation:
    - space="preserve" listing-type="equation">B.sub.1(j,k) =A.sub.(j,k-1) +2*A.sub.(j,k) +A.sub.(j,k+1)
      where j and k are positive integers.
  - 4. A system according to claim 2, wherein said second axis dictionary data forming means includes means for computing differentiated pattern data D₂(j,k) by performing a differentiating operation using the averaged pattern data A.sub.(j,k) in accordance with an equation:
    - space="preserve" listing-type="equation">D.sub.2(j,k) =-A.sub.(j,k-1) +A.sub.(j,k+1)
      where j and k are positive integers.
  - 5. A system according to claim 1, further comprising means for reorthogonalizing the dictionary data of the second axis so as to set the dictionary data of the second axis to be orthogonal to the dictionary data of the first axis by using an inner product calculation of the dictionary data of the first and second axes.
  - 6. A system according to claim 5, whereinsaid average pattern data forming means includes means for computing averaged pattern data A.sub.(j,k) from learning pattern data a_m(j,k) for one of said categories i in accordance with an equation:
    - ##EQU6## where j and k are positive integers, said first axis dictionary data forming means includes means for computing blurred pattern data by performing a blurring (smoothing) operation using the averaged pattern data A.sub.(j,k) in accordance with an equation;
      space="preserve" listing-type="equation">B.sub.1(j,k) =A.sub.(j,k-1) +2*A.sub.(j,k) +A.sub.(j,k+1)
      where j and k are positive integers,
      said second axis dictionary data forming means comprises means for computing differentiated pattern data D₂(j,k) by performing a differentiating operation using the averaged pattern data A.sub.(j,k) in accordance with an equation;
      
      space="preserve" listing-type="equation">D.sub.2(j,k) =-A.sub.(j,k-1) +A.sub.(j,k+1)where j and k are positive integers, and
      said reorthogonalizing means comprises means for executing a calculation based on the following equation;
      
      D₂(j,k) =D₂(j,k) -(D₂(j,k).B₁(j,k))B₁(j,k)normalizing reorthogonalized dictionary data D₂(j,k), and storing the normalized data D₂(j,k) in said orthogonalized dictionary as new dictionary data D₂(j,k) of the second axis.
  - 7. A system according to claim 6, wherein the learning data a_m(j,k) for the category i comprises learning pattern data of a first of a plurality of speakers belonging to a predetermined group, and dictionary data of the first and second axes are stored in said orthogonalized dictionary.
  - 8. A system according to claim 7, further comprising:
    - means for forming additional orthogonal dictionary data based on the orthogonal dictionary data obtained from the first of said plurality of speakers in accordance with sampling pattern data of a second of said plurality of speakers so as to be orthogonal to the orthogonal dictionary data of the first speaker.
  - 9. A system according to claim 8, wherein said additional orthogonal dictionary data forming means comprises:
    - means for computing the averaged pattern data A.sub.(j,k) by using the sampling data of the second speaker;
      
      means for computing blurred pattern data c₁ and differentiated pattern data c₂ by using the averaged pattern data A.sub.(j,k) ;
      
      means for computing additional dictionary data b_p+m of the second speaker in accordance with an equation;
      
      ##EQU7## where p is the number of axes of the orthogonal dictionary data already obtained, and normalizing and outputting orthogonal vector data b_p+m representing feature variations of the second speaker as an additional dictionary having a new axis; and
      
      means for determining whether the additional orthogonal dictionary data is to be added to the orthogonalized dictionary.
  - 10. A system according to claim 9, wherein said additional orthogonal dictionary data determining means comprises:
    - means for causing the orthogonal vector data b_p+m to be added in the orthogonalized dictionary when a norm ∥
      
      b_p+m ∥
      
      of the orthogonal vector data is larger than a predetermined value.

11. A speech recognition apparatus for computing a similarity between an input speech pattern obtained by analyzing input speech and an orthogonalized dictionary formed on the basis of learning patterns acquired from a plurality of speakers in advance, and for recognizing the input speech based on the computed similarity comprising:
- means for obtaining an averaged pattern of a plurality of learning patterns obtained from each speaker, and obtaining a blurred pattern and a differential pattern from the averaged pattern; and
  
  means for determining an orthogonal axis on which an orthogonalized dictionary is based from the blurred and differentiated patterns obtained from a learning pattern of a first of said plurality of speakers, determining a new axis orthogonal to an axis of the dictionary, which has already been stored, from the blurred and differentiated patterns obtained from learning patterns of second and subsequent of said plurality of speakers, and determining whether the dictionary of the new axis is stored, thereby forming the orthogonalized dictionary.

12. A speech recognition system for a plurality of speakers, comprising:
- means for converting input speech from a plurality of speakers in to an electrical signal;
  
  means for performing acoustic analysis of the electrical signal;
  
  means for obtaining sampling pattern data from said electrical signal upon which the acoustic analysis has been performed;
  
  means for obtaining first averaged pattern data from a plurality of sampling pattern data of a first of the plurality of speakers, and forming dictionary data of first and second axes from the first averaged pattern data;
  
  orthogonalized dictionary means for storing the dictionary data of the first and second axes;
  
  means for obtaining second average pattern data from a plurality of sampling pattern data of at least one of a second and subsequent of said plurality of speakers;
  
  means for obtaining additional dictionary data having an axis different from the first and second axes on the basis of the second averaged pattern data;
  
  means for storing the additional data in said orthogonalized dictionary means; and
  
  means for recognizing the input speech by using the dictionary data stored in said orthogonalized dictionary means.

13. A speech recognition method, comprising the steps of:
- converting input speech into an electrical signal and obtaining speech pattern data upon acoustic analysis of said electrical signal;
  
  detecting a speech interval of the electrical signal;
  
  generating sampling pattern data by extracting a predetermined number of samples from speech pattern data included in the detected speech interval, said sampling pattern data including learning pattern data;
  
  prestoring sampling pattern data of a plurality of speakers for categories of speech to be recognized;
  
  forming orthogonalized dictionary data for each speaker to be stored in said orthogonalized dictionary on the basis of the sampling pattern data, by forming averaged pattern data of a plurality of sampling pattern data obtained from each speaker;
  
  forming dictionary data of a first axis by smoothing the averaged pattern data in a time base direction;
  
  forming dictionary data of a second axis orthogonal to the first axis by differentiating the averaged pattern data in the time base direction;
  
  storing the dictionary data of the first and second axes in an orthogonalized dictionary;
  
  forming additional dictionary data representing feature variations in speech of each speaker and being orthogonal to the dictionary data stored in said orthogonalized dictionary in accordance with sampling pattern data of each of a second and subsequent of said plurality of speakers on the basis of the dictionary data obtained with respect to a first of said plurality of speakers;
  
  selectively storing the additional dictionary data in said orthogonalized dictionary;
  
  computing a similarity value between the dictionary data stored in said orthogonalized dictionary and the sampling pattern data formed by said sampling pattern data; and
  
  recognizing input speech on the basis of the similarity value.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. A method according to claim 13, wherein said average pattern data forming step comprises:
    - computing averaged pattern data A.sub.(j,k) from learning pattern data a_m(j,k) for one of said categories accordance with an equation;
      
      ##EQU8## where j and k are positive integers.
  - 15. A method according to claim 14, wherein said first axis dictionary data forming step comprises:
    - computing blurred pattern data B₁(j,k) by performing a blurring (smoothing) operation using the averaged pattern data A.sub.(j,k) in accordance with an equation;
      
      space="preserve" listing-type="equation">B.sub.1(j,k) =A.sub.(j,k-1) +2*A.sub.(j,k) +A.sub.(j,k+1)where j and k are positive integers
  - 16. A method according to claim 14, wherein said second axis dictionary data forming step comprises:
    - computing differentiated pattern data D₂(j,k) by performing a differentiating operation using the averaged pattern data A.sub.(j,k) in accordance with an equation;
      
      space="preserve" listing-type="equation">D.sub.2(j,k) =A.sub.(j,k-1) +A.sub.(j,k+1)where j and k are positive integers.
  - 17. A method according to claim 13, further comprising:
    - reorthogonalizing the dictionary data of the second axis so as to set the dictionary data of the second axis to be orthogonal to the dictionary data of the first axis by using an inner product calculation of the dictionary data of the first and second axes.
  - 18. A method according to claim 17, wherein:
    - said average pattern data forming step comprises computing averaged pattern data A.sub.(j,k) from learning pattern data a_m(j,k) for one of said categories i in accordance with an equation;
      
      ##EQU9## where j and k are positive integers, said first axis dictionary data forming step comprises computing blurred pattern data by performing a blurring (smoothing) operation using the averaged pattern data A.sub.(j,k) in accordance with an equation;
      
      space="preserve" listing-type="equation">B.sub.1(j,k) =A.sub.(j,k-1) +2*A.sub.(j,k) +A.sub.(j,k+1)where j and k are positive integers,
      said second axis dictionary data forming step comprises computing differentiated pattern data D₂(j,k) by performing a differentiating operation using the averaged pattern data A.sub.(j,k) in accordance with an equation;
      
      space="preserve" listing-type="equation">D.sub.2(j,k) =A.sub.(j,k-1) +A.sub.(j,k+1)where j and k are positive integers, and
      said reorthogonalizing step comprises executing a calculation based on the following equation;
      
      space="preserve" listing-type="equation">D.sub.2(j,k) =D.sub.2(j,k) -(D.sub.2(j,k).B.sub.1(j,k))B.sub.1(j,k)normalizing reorthogonalized dictionary data D₂(j,k), and storing the normalized data D₂(j,k) in said orthogonalized dictionary as new dictionary data D₂(j,k) of the second axis.
  - 19. A method according to claim 18, wherein the learning data a_m(j,k) for the category i comprises:
    - learning pattern data of a first of said plurality of speakers, said plurality of speakers belonging to a predetermined group, and dictionary data of the first and second axes are stored in said orthogonalized dictionary.
  - 20. A method according to claim 19, further comprising:
    - forming an additional dictionary based on the orthogonalized dictionary obtained from the first of said plurality of speakers in accordance with sampling pattern data of the second of said plurality of speakers so as to be orthogonal to the orthogonal dictionary.
  - 21. A method according to claim 20, wherein said additional dictionary forming step comprises:
    - computing the averaged pattern data A.sub.(j,k) by using the sampling data of the second of said plurality of speakers;
      
      computing blurred pattern data c₁ and differentiated pattern data c₂ by using the averaged pattern data A.sub.(j,k) ;
      
      computing additional dictionary data b_p+m of the second speaker in accordance with an equation;
      
      ##EQU10## where p is the number of axes of an orthogonalized dictionary already obtained, and normalizing and outputting orthogonal vector data b_p+m representing feature variations of the second of said plurality of speakers as an additional dictionary having a new axis; and
      
      determining whether the additional dictionary is added in the orthogonalized dictionary.
  - 22. A method according to claim 21, wherein said addition determining step comprises:
    - causing the orthogonal vector data b_p+m to be added in the orthogonalized dictionary when a norm ∥
      
      b_p+m ∥
      
      of the orthogonal vector data is larger than a predetermined value.

23. A speech recognition method wherein a similarity is computed between an input speech pattern obtained by analyzing input speech and an orthogonalized dictionary formed on the basis of learning patterns acquired from a plurality of speakers in advance, and the input speech is recognized based on the computed similarity, comprising the steps of:
- obtaining an averaged pattern of a plurality of learning patterns obtained from each speaker, and obtaining a blurred pattern and a differentiated pattern from the averaged pattern; and
  
  determining an orthogonal axis on which an orthogonalized dictionary is based from the blurred and differentiated patterns obtained from a learning pattern of a first of said plurality of speakers, determining a new axis orthogonal to an axis of the dictionary, which has already been stored, from the blurred and differentiated patterns obtained from learning patterns of a second and subsequent of said plurality of speakers, and determining whether the dictionary of the new axis is stored, thereby forming the orthogonalized dictionary.

24. A speech recognition method for a plurality of speakers, comprising the steps of:
- converting input speech from a plurality of speakers into an electrical signal;
  
  performing acoustic analysis of the electrical signal;
  
  obtaining sampling pattern data from an electrical signal upon which the acoustic analysis has been performed;
  
  obtaining averaged pattern data from a plurality of sampling pattern data of a first of the plurality of speakers, and forming dictionary data of first and second axes from the averaged pattern data;
  
  storing the dictionary data of the first and second axes in an orthogonalized dictionary;
  
  obtaining second average pattern data from a plurality of sampling pattern data of at least a second of said plurality of speakers;
  
  obtaining additional dictionary data having an axis different from the first and second axes on the basis of the second averaged pattern data;
  
  storing the additional data in said orthogonalized dictionary means; and
  
  determining the input speech by using the dictionary data stored in said orthogonalized dictionary means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
3894576 Canada, Ltd.
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Nitta, Tsuneo
Primary Examiner(s)
Merecki, John A.

Application Number

US07/378,780
Time in Patent Office

524 Days
Field of Search

381/41-46, 364/513.5
US Class Current

704/245
CPC Class Codes

G10L 15/063 Training

Orthogonalized dictionary speech recognition apparatus and method thereof

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

20 Citations

24 Claims

Specification

Use Cases

Quick Links

Others

Orthogonalized dictionary speech recognition apparatus and method thereof

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

20 Citations

24 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others