Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model

US 20040249639A1
Filed: 04/12/2004
Published: 12/09/2004
Est. Priority Date: 10/11/2001
Status: Active Grant

First Claim

Patent Images

1-14. -14 (cancelled)

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method models voice units and produces reference segments for modeling voice units. The reference segments describes voice modules by characteristic vectors, the characteristic vectors being stored in the order in which they are found in a training voice signal. Alternative characteristic vectors are associated with each characteristic vector. The reference segments for describing the voice modules are combined during the modeling of larger voice units. In the event of identification, the respectively best adapted characteristic vector alternatives are used to determined the distance between a test utterance and the larger vocal units.

Citations

38 Claims

1-14. -14 (cancelled)

15. A method for producing reference segments describing speech modules, for a voice recognition system, comprising:
- segmenting a spoken training voice signal into speech modules in accordance with a predefined transcription;
  
  subdividing each speech module into a sequence of time windows;
  
  analyzing the spoken training voice signal in each time window to obtain a characteristic vector for each time window and obtain a training model from a sequence of characteristic vectors corresponding to the sequence of time windows, each speech module having a plurality training models corresponding to a plurality of different pronunciations for the speech module;
  
  forming an average time structure for each speech module, the average time structure being formed by comparing the plurality of training modules for the speech module, the average time structure containing information regarding an average pronunciation speed and style, the average time structure having a plurality of time windows, the average time structure being formed by mapping the characteristic vectors of the different training models onto the time windows of the average time structure such that each time window of the average time structure contains a plurality of characteristic vectors, the characteristic vectors being mapped using a non-linear mapping; and
  
  saving the plurality of time windows for the average time structure as a reference segment.

16. A method for producing reference segments for a voice recognition system, comprising:
- segmenting a training voice signal into speech modules in accordance with a predefined transcription;
  
  analyzing the training voice signal in predetermined time windows in order to obtain at least one characteristic vector for each time window, as a result of which training models are formed which in each case contain characteristic vectors in the time sequence of the training voice signal;
  
  determining an average time structure, which is an average of change duration and time sequence characteristics, for each speech module;
  
  assigning the characteristic vectors to the average time structure by a temporally non-linear mapping to produce a reference segment; and
  
  storing the reference segment.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 30, 31, 32, 33, 34)
- - 17. The method according to claim 16, wherein the training voice signal is segmented into speech modules to separate phonemes, diphthongs, diphones, triphones or syllables.
  - 18. The method according to claim 16, wherein the characteristic vectors of the training models represent spectral characteristics, autocorrelation characteristics, LPC characteristics, MFCC characteristics or CC characteristics.
  - 19. The method according to claim 16, wherein the average time sequence is obtained by performing non-linear mappings of the training models on the speech module to one another and by averaging the mappings.
  - 20. The method according to claim 16, further comprising clustering the characteristic vectors of the time windows.
  - 21. The method according to claim 20, wherein the number of characteristic vectors per time window are limited to a particular number.
  - 22. The method according to claim 20, wherein the number of characteristic vectors corresponds to a variance in the characteristic vectors for the training models, such that if there is a greater variance, more characteristic vectors are used.
  - 30. The method according to claim 17, wherein the characteristic vectors of the training models represent spectral characteristics, autocorrelation characteristics, LPC characteristics, MFCC characteristics or CC characteristics.
  - 31. The method according to claim 30, wherein the average time sequence is obtained by performing non-linear mappings of the training models on the speech module to one another and by averaging the mappings.
  - 32. The method according to claim 31, further comprising clustering the characteristic vectors of the time windows.
  - 33. The method according to claim 32, wherein the number of characteristic vectors per time window are limited to a particular number.
  - 34. The method according to claim 33, wherein the number of characteristic vectors corresponds to a variance in the characteristic vectors for the training models, such that if there is a greater variance, more characteristic vectors are used.

23. A method for modeling speech units of a spoken test model in a voice recognition systems, comprising:
- producing reference segments describing speech modules for a voice recognition system, comprising;
  
  segmenting a spoken training voice signal into speech modules in accordance with a predefined transcription;
  
  subdividing each speech module into a sequence of time windows;
  
  analyzing the spoken training voice signal in each time window to obtain a characteristic vector for each time window and obtain a training model from a sequence of characteristic vectors corresponding to the sequence of time windows, each speech module having a plurality training models corresponding to a plurality of different pronunciations for the speech module;
  
  forming an average time structure for each speech module, the average time structure being formed by comparing the plurality of training modules for the speech module, the average time structure containing information regarding an average pronunciation speed and style, the average time structure having a plurality of time windows, the average time structure being formed by mapping the characteristic vectors of the different training models onto the time windows of the average time structure such that each time window of the average time structure contains a plurality of characteristic vectors, the characteristic vectors being mapped using a non-linear mapping; and
  
  saving the plurality of time windows for the average time structure as a reference segment;
  
  forming a plurality of reference models, each reference model being formed by combining a plurality of reference segments, each reference model representing a speech unit;
  
  performing a non-linear comparison of the reference models with the test model and determining in each case a distance between the reference model and the test model; and
  
  selecting the reference model having the smallest distance from the test model, whereby the speech unit represented by the reference segments is assigned to the test model.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 35, 36, 37, 38)
- - 24. The method according to claim 23, wherein each reference model represents a word to be recognized.
  - 25. The method according to claim 24, wherein each reference model is formed from a concatenation of the reference segments in accordance with the transcription.
  - 26. The method according to claim 23, wherein the non-linear comparison is effected by a non-linear time adjustment of the test model to the reference models for the words to be recognized.
  - 27. The method according to claim 26, wherein the non-linear time adjustment is restricted to a defined working range.
  - 28. The method according to claim 23, wherein each reference segment has a characteristic vector, the test model has a characteristic vector, in performing the non-linear comparison, a distance is determined between the characteristic vector of the test model and each of the characteristic vectors of the reference segment, and the distance is determined to be the minimum of the distances between the characteristic vector of the test model and the characteristic vectors of the reference segments.
  - 29. The method according to claim 23, wherein distortion is limited in the non-linear mapping.
  - 35. The method according to claim 25, wherein the non-linear comparison is effected by a non-linear time adjustment of the test model to the reference models for the words to be recognized.
  - 36. The method according to claim 35, wherein the non-linear time adjustment is restricted to a defined working range.
  - 37. The method according to claim 36, wherein each reference segment has a characteristic vector, the test model has a characteristic vector, in performing the non-linear comparison, a distance is determined between the characteristic vector of the test model and each of the characteristic vectors of the reference segment, and the distance is determined to be the minimum of the distances between the characteristic vector of the test model and the characteristic vectors of the reference segments.
  - 38. The method according to claim 37, wherein distortion is limited in the non-linear mapping.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Siemens AG
Inventors
Kammerer, Bernhard

Granted Patent

US 7,398,208 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/249
CPC Class Codes

G10L 15/063   Training

G10L 15/12   using dynamic programming t...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/0631   Creating reference template...

Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links