ACCENT INVARIANT SPEECH RECOGNITION
First Claim
1. A method for accent invariant speech recognition comprising:
- maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;
extracting and storing in the database a feature vector for locating each of the audio samples in a feature space;
identifying two types of distances;
(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;
calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and
based on the calculated transformation, training a processor to classify as a same language unit pronunciation, variations of the same language unit.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for accent invariant speech recognition comprising: maintaining a database scoring a set of language units in a given language, and for each of the language units, scoring audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; extracting and storing m the database a feature vector for locating each of the audio samples in a feature space; identifying pronunciation variation distances, which are distances between locations of audio samples of the same language unit in the feature space, and inter-unit distances, which are distances between locations of audio samples of different language units in the feature space; calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and based on the calculated transformation, training a processor to classify as a same language unit pronunciation variations of the same language unit.
26 Citations
11 Claims
-
1. A method for accent invariant speech recognition comprising:
-
maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; extracting and storing in the database a feature vector for locating each of the audio samples in a feature space; identifying two types of distances;
(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and based on the calculated transformation, training a processor to classify as a same language unit pronunciation, variations of the same language unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A. method for accent invariant speech recognition comprising:
-
maintaining a database storing a set of language units in a given language, and for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers with known accents, wherein the audio samples are indexed according to the language unit, and accent integrated in the audio sample; and training a processor to classify an audio signal as a corresponding language unit for a given accent. - View Dependent Claims (10)
-
-
11. A method for accent invariant speech recognition composing:
-
maintaining a database for storing a set of language units in a given language, wherein for each of the language units, storing a standard pronunciation audio sample and a plurality of variant audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; for each audio sample, extracting a descriptor and storing the descriptor in the database, thus obtaining at least one standard descriptor and a group of variant descriptors; training a processor to produce a transformation procedure for transforming the variant descriptors to the standard descriptor and a discriminative procedure to distinguish between the standard descriptor and the transformed variant descriptors, until the transformed variant descriptors are indistinguishable from the standard descriptor, receiving an input audio signal; and by the trained transformation procedure, transforming the input audio signal to a modified signal indistinguishable from the respective standard pronunciation sample.
-
Specification