Accent invariant speech recognition

US 10,446,136 B2
Filed: 05/11/2017
Issued: 10/15/2019
Est. Priority Date: 05/11/2017
Status: Active Grant

First Claim

Patent Images

1. A method for accent invariant speech recognition comprising:

maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;

extracting and storing in the database a feature vector for locating each of the audio samples in a feature space;

identifying two types of distances;

(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and

(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;

calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distancesm, the transformation is configured to make various pronunciation variations of the same language unit indistinguishable by a classification processor;

when receiving an input audio;

transforming the received signal to an accent-invariant audio signal by applying the calculated transformation on the input audio signal, wherein language units included in the accent-invariant audio signal are indistinguishable by the classification processor from other pronunciation variations of the same language units; and

recognizing a language unit in said input audio signal, by applying classification by said classification processor.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for accent invariant speech recognition comprising: maintaining a database scoring a set of language units in a given language, and for each of the language units, scoring audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; extracting and storing m the database a feature vector for locating each of the audio samples in a feature space; identifying pronunciation variation distances, which are distances between locations of audio samples of the same language unit in the feature space, and inter-unit distances, which are distances between locations of audio samples of different language units in the feature space; calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and based on the calculated transformation, training a processor to classify as a same language unit pronunciation variations of the same language unit.

Citations

8 Claims

1. A method for accent invariant speech recognition comprising:
- maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;
  
  extracting and storing in the database a feature vector for locating each of the audio samples in a feature space;
  
  identifying two types of distances;
  
  (i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
  
  (ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;
  
  calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distancesm, the transformation is configured to make various pronunciation variations of the same language unit indistinguishable by a classification processor;
  
  when receiving an input audio;
  
  transforming the received signal to an accent-invariant audio signal by applying the calculated transformation on the input audio signal, wherein language units included in the accent-invariant audio signal are indistinguishable by the classification processor from other pronunciation variations of the same language units; and
  
  recognizing a language unit in said input audio signal, by applying classification by said classification processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the language units are words or phonemes.
  - 3. The method of claim 1, wherein recognizing a language unit comprises adjusting classification based on language statistics.
  - 4. The method of claim 1 wherein said method further comprises applying the calculated transformation to the samples of pronunciation variations stored in the database.
  - 5. The method of claim 1, wherein said calculated transformation comprises a Linear Discriminant Analysis (LDA) transformation.
  - 6. The method of claim 1, wherein said calculated transformation is performed by an appropriately trained neural network.
  - 7. The method of claim 1, wherein the stored audio samples are of pronunciation variations of the language unit pronounced by a plurality of speakers of different ethnic groups.

8. A method for accent invariant speech recognition comprising:
- maintaining a database storing a set of language units in a given language, and for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers with known accents,wherein the audio samples are indexed according to the language unit and accent integrated in the audio sample;
  
  for each known accent;
  
  identifying two types of distances;
  
  (i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
  
  (ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;
  
  calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances, the transformation is configured to make various pronunciation variations of the same language unit and accent indistinguishable by a classification processor; and
  
  when receiving an input audio signal,in case accent of the received audio signal s recognized, applying classification for the recognized accent by said processor, thus recognizing a language unit in said input audio signal; and
  
  in case an accent of the received audio signal is not recognized;
  
  applying a separate classification for each of the known accents, thus recognizing a language unit in said input audio signal for each of the known accents; and
  
  selecting the most probable recognized language unit,wherein applying classification for the recognized accent comprises transforming the received signal by applying on the input audio signal the corresponding calculated transformation, wherein language units included in the transformed audio signal are indistinguishable by the classification processor from other pronunciation variations of the same language units and accent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
KAMI Vision Incorporated
Original Assignee
ANTS TECHNOLOGY (HK) LIMITED
Inventors
Fridental, Ron, Blayvas, Ilya, Nosko, Pavel
Primary Examiner(s)
Roberts, Shaun

Application Number

US15/592,222
Publication Number

US 20180330719A1
Time in Patent Office

887 Days
Field of Search

704244, 704254
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/10   using distance or distortio...

G10L 15/14   using statistical models, e...

G10L 15/16   using artificial neural net...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/0631   Creating reference template...

Accent invariant speech recognition

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Accent invariant speech recognition

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links