Method for producing reference segments describing voice modules and method for modelling voice units of a spoken test model
9 Assignments
0 Petitions
Accused Products
Abstract
A method models voice units and produces reference segments for modeling voice units. The reference segments describes voice modules by characteristic vectors, the characteristic vectors being stored in the order in which they are found in a training voice signal. Alternative characteristic vectors are associated with each characteristic vector. The reference segments for describing the voice modules are combined during the modeling of larger voice units. In the event of identification, the respectively best adapted characteristic vector alternatives are used to determined the distance between a test utterance and the larger vocal units.
-
Citations
38 Claims
-
1-14. -14 (cancelled)
-
15. A method for producing reference segments describing speech modules, for a voice recognition system, comprising:
-
segmenting a spoken training voice signal into speech modules in accordance with a predefined transcription;
subdividing each speech module into a sequence of time windows;
analyzing the spoken training voice signal in each time window to obtain a characteristic vector for each time window and obtain a training model from a sequence of characteristic vectors corresponding to the sequence of time windows, each speech module having a plurality training models corresponding to a plurality of different pronunciations for the speech module;
forming an average time structure for each speech module, the average time structure being formed by comparing the plurality of training modules for the speech module, the average time structure containing information regarding an average pronunciation speed and style, the average time structure having a plurality of time windows, the average time structure being formed by mapping the characteristic vectors of the different training models onto the time windows of the average time structure such that each time window of the average time structure contains a plurality of characteristic vectors, the characteristic vectors being mapped using a non-linear mapping; and
saving the plurality of time windows for the average time structure as a reference segment.
-
-
16. A method for producing reference segments for a voice recognition system, comprising:
-
segmenting a training voice signal into speech modules in accordance with a predefined transcription;
analyzing the training voice signal in predetermined time windows in order to obtain at least one characteristic vector for each time window, as a result of which training models are formed which in each case contain characteristic vectors in the time sequence of the training voice signal;
determining an average time structure, which is an average of change duration and time sequence characteristics, for each speech module;
assigning the characteristic vectors to the average time structure by a temporally non-linear mapping to produce a reference segment; and
storing the reference segment. - View Dependent Claims (17, 18, 19, 20, 21, 22, 30, 31, 32, 33, 34)
-
-
23. A method for modeling speech units of a spoken test model in a voice recognition systems, comprising:
-
producing reference segments describing speech modules for a voice recognition system, comprising;
segmenting a spoken training voice signal into speech modules in accordance with a predefined transcription;
subdividing each speech module into a sequence of time windows;
analyzing the spoken training voice signal in each time window to obtain a characteristic vector for each time window and obtain a training model from a sequence of characteristic vectors corresponding to the sequence of time windows, each speech module having a plurality training models corresponding to a plurality of different pronunciations for the speech module;
forming an average time structure for each speech module, the average time structure being formed by comparing the plurality of training modules for the speech module, the average time structure containing information regarding an average pronunciation speed and style, the average time structure having a plurality of time windows, the average time structure being formed by mapping the characteristic vectors of the different training models onto the time windows of the average time structure such that each time window of the average time structure contains a plurality of characteristic vectors, the characteristic vectors being mapped using a non-linear mapping; and
saving the plurality of time windows for the average time structure as a reference segment;
forming a plurality of reference models, each reference model being formed by combining a plurality of reference segments, each reference model representing a speech unit;
performing a non-linear comparison of the reference models with the test model and determining in each case a distance between the reference model and the test model; and
selecting the reference model having the smallest distance from the test model, whereby the speech unit represented by the reference segments is assigned to the test model. - View Dependent Claims (24, 25, 26, 27, 28, 29, 35, 36, 37, 38)
-
Specification