Accent invariant speech recognition
First Claim
1. A method for accent invariant speech recognition comprising:
- maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;
extracting and storing in the database a feature vector for locating each of the audio samples in a feature space;
identifying two types of distances;
(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;
calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distancesm, the transformation is configured to make various pronunciation variations of the same language unit indistinguishable by a classification processor;
when receiving an input audio;
transforming the received signal to an accent-invariant audio signal by applying the calculated transformation on the input audio signal, wherein language units included in the accent-invariant audio signal are indistinguishable by the classification processor from other pronunciation variations of the same language units; and
recognizing a language unit in said input audio signal, by applying classification by said classification processor.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for accent invariant speech recognition comprising: maintaining a database scoring a set of language units in a given language, and for each of the language units, scoring audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; extracting and storing m the database a feature vector for locating each of the audio samples in a feature space; identifying pronunciation variation distances, which are distances between locations of audio samples of the same language unit in the feature space, and inter-unit distances, which are distances between locations of audio samples of different language units in the feature space; calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and based on the calculated transformation, training a processor to classify as a same language unit pronunciation variations of the same language unit.
-
Citations
8 Claims
-
1. A method for accent invariant speech recognition comprising:
-
maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; extracting and storing in the database a feature vector for locating each of the audio samples in a feature space; identifying two types of distances;
(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distancesm, the transformation is configured to make various pronunciation variations of the same language unit indistinguishable by a classification processor; when receiving an input audio; transforming the received signal to an accent-invariant audio signal by applying the calculated transformation on the input audio signal, wherein language units included in the accent-invariant audio signal are indistinguishable by the classification processor from other pronunciation variations of the same language units; and recognizing a language unit in said input audio signal, by applying classification by said classification processor. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for accent invariant speech recognition comprising:
-
maintaining a database storing a set of language units in a given language, and for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers with known accents, wherein the audio samples are indexed according to the language unit and accent integrated in the audio sample; for each known accent; identifying two types of distances;
(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances, the transformation is configured to make various pronunciation variations of the same language unit and accent indistinguishable by a classification processor; and when receiving an input audio signal, in case accent of the received audio signal s recognized, applying classification for the recognized accent by said processor, thus recognizing a language unit in said input audio signal; and in case an accent of the received audio signal is not recognized; applying a separate classification for each of the known accents, thus recognizing a language unit in said input audio signal for each of the known accents; and selecting the most probable recognized language unit, wherein applying classification for the recognized accent comprises transforming the received signal by applying on the input audio signal the corresponding calculated transformation, wherein language units included in the transformed audio signal are indistinguishable by the classification processor from other pronunciation variations of the same language units and accent.
-
Specification