ACCENT INVARIANT SPEECH RECOGNITION

US 20180330719A1
Filed: 05/11/2017
Published: 11/15/2018
Est. Priority Date: 05/11/2017
Status: Active Grant

First Claim

Patent Images

1. A method for accent invariant speech recognition comprising:

maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;

extracting and storing in the database a feature vector for locating each of the audio samples in a feature space;

identifying two types of distances;

(i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and

(ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;

calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and

based on the calculated transformation, training a processor to classify as a same language unit pronunciation, variations of the same language unit.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for accent invariant speech recognition comprising: maintaining a database scoring a set of language units in a given language, and for each of the language units, scoring audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers; extracting and storing m the database a feature vector for locating each of the audio samples in a feature space; identifying pronunciation variation distances, which are distances between locations of audio samples of the same language unit in the feature space, and inter-unit distances, which are distances between locations of audio samples of different language units in the feature space; calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and based on the calculated transformation, training a processor to classify as a same language unit pronunciation variations of the same language unit.

26 Citations

View as Search Results

11 Claims

1. A method for accent invariant speech recognition comprising:
- maintaining a database for storing a set of language units in a given language, wherein for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;
  
  extracting and storing in the database a feature vector for locating each of the audio samples in a feature space;
  
  identifying two types of distances;
  
  (i) pronunciation variation, which are distances between locations of audio samples of the same language unit with different pronunciations, in the feature space; and
  
  (ii) inter-unit distances, which are distances between locations of audio samples of different language units in the feature space;
  
  calculating a transformation applicable on the feature space to reduce the pronunciation variation distances relative to the inter-unit distances; and
  
  based on the calculated transformation, training a processor to classify as a same language unit pronunciation, variations of the same language unit.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the language units are words or phonemes,
  - 3. The method of claim 1, comprising:
    - receiving an input audio signal;
      
      applying the calculated transformation on the input signal; and
      
      recognizing a language unit in said input audio signal, by applying classification by said processor.
  - 4. The method of claim 1, wherein recognizing a language unit comprises adjusting classification based on language statistics.
  - 5. The method of claim 1, wherein said training comprises applying the calculated transformation to the samples of pronunciation variations stored in the database.
  - 6. The method of claim 1, wherein said calculated transformation comprises a Linear Discriminant Analysis (LDA) transformation.
  - 7. The method of claim 1, wherein said calculated transformation is performed by an appropriately trained neural network.
  - 8. The method of claim 1, wherein the stored audio samples are of pronunciation variations of the language unit pronounced by a plurality of speakers of different ethnic groups.

9. A. method for accent invariant speech recognition comprising:
- maintaining a database storing a set of language units in a given language, and for each language unit, storing audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers with known accents, wherein the audio samples are indexed according to the language unit, and accent integrated in the audio sample; and
  
  training a processor to classify an audio signal as a corresponding language unit for a given accent.
- View Dependent Claims (10)
- - 10. The method of claim 9, the method further comprising:
    - receiving an input audio signal;
      
      analyzing the audio signal to recognize an accent;
      
      in case an accent of the received audio signal is recognized, applying classification fox the recognized accent by said processor, thus recognizing a language unit in said input audio signal; and
      
      in case an accent of the received audio signal is not recognized;
      
      applying separate classification for each of the known accents, thus recognizing a language unit in said input audio signal for each of the known accents; and
      
      selecting the most probable recognized language unit.

11. A method for accent invariant speech recognition composing:
- maintaining a database for storing a set of language units in a given language, wherein for each of the language units, storing a standard pronunciation audio sample and a plurality of variant audio samples of pronunciation variations of the language unit pronounced by a plurality of speakers;
  
  for each audio sample, extracting a descriptor and storing the descriptor in the database, thus obtaining at least one standard descriptor and a group of variant descriptors;
  
  training a processor to produce a transformation procedure for transforming the variant descriptors to the standard descriptor and a discriminative procedure to distinguish between the standard descriptor and the transformed variant descriptors, until the transformed variant descriptors are indistinguishable from the standard descriptor, receiving an input audio signal; and
  
  by the trained transformation procedure, transforming the input audio signal to a modified signal indistinguishable from the respective standard pronunciation sample.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
KAMI Vision Incorporated
Original Assignee
ANTS TECHNOLOGY (HK) LIMITED
Inventors
FRIDENTAL, Ron, BLAYVAS, Ilya, NOSKO, Pavel

Granted Patent

US 10,446,136 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/065   Adaptation

G10L 15/10   using distance or distortio...

G10L 15/14   using statistical models, e...

G10L 15/16   using artificial neural net...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/0631   Creating reference template...

ACCENT INVARIANT SPEECH RECOGNITION

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

ACCENT INVARIANT SPEECH RECOGNITION

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links