METHOD AND APPARATUS FOR IMPROVING SPEECH RECOGNITION PROCESSING PERFORMANCE

US 20160322059A1
Filed: 04/29/2015
Published: 11/03/2016
Est. Priority Date: 04/29/2015
Status: Active Grant

First Claim

Patent Images

1. A method for improving computation time of speech recognition processing in an electronic device, the method comprising:

by a processor;

obtaining a table value, from a plurality of table values each corresponding to a unique summation of vector element values of a first vector, via an index corresponding to an encoded form of a combination of quantized element values of a second vector; and

computing a dot product value of the first and second vectors using the table value obtained, the vector element values, and the quantized element values, the processor using fewer mathematical operations to compute the dot product value relative to a standard dot product computation of the first and second vectors, the speech recognition processing performing multiple dot product computations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computing the feature Maximum Mutual Information (fMMI) method requires multiplication of vectors with a huge matrix. The huge matrix is subdivided into block sub-matrices. The sub-matrices are quantized into different values and compressed by replacing the quantized element values with 1 or 2 bit indices. Fast multiplication with those compressed matrices with far fewer multiply/accumulate operations compared to standard matrix computation is enabled and additionally obviates a de-compression method for decompressing the sub-matrices before use.

Citations

20 Claims

1. A method for improving computation time of speech recognition processing in an electronic device, the method comprising:
- by a processor;
  
  obtaining a table value, from a plurality of table values each corresponding to a unique summation of vector element values of a first vector, via an index corresponding to an encoded form of a combination of quantized element values of a second vector; and
  
  computing a dot product value of the first and second vectors using the table value obtained, the vector element values, and the quantized element values, the processor using fewer mathematical operations to compute the dot product value relative to a standard dot product computation of the first and second vectors, the speech recognition processing performing multiple dot product computations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the first vector is a feature vector and the vector element values correspond to speech segments extracted from an audio stream for the speech recognition processing, the audio stream received via an audio interface of the electronic device.
  - 3. The method of claim 1, wherein the speech recognition processing employs a feature Maximum Mutual Information (fMMI) method and the quantized element values of the second vector correspond to quantized coefficients of a matrix computed by the fMMI method.
  - 4. The method of claim 1, further comprising pre-computing the table.
  - 5. The method of claim 4, wherein the speech recognition processing uses the pre-computed table multiple times for a subset of the multiple dot product computations performed.
  - 6. The method of claim 4, wherein the quantized element values of the second vector correspond to consecutive matrix elements of a given row of a plurality of rows of a given block matrix of an fMMI matrix, and wherein the method of claim 1 further comprises reusing the pre-computed table multiple times, once for each row of the given block matrix.
  - 7. The method of claim 1, wherein the table value is a first table value, the table value obtained is a first table value obtained, the index is a first index, and computing the dot product value of the first and second vectors using the table value obtained, the vector element values, and the quantized element values includes:
    - obtaining a second table value from the plurality of table values via a second index, the second table value corresponding to a sum value of each vector element value of the vector element values, the second index being a pre-determined index;
      
      computing a first variable value by multiplying a first quantized element value of the quantized element values with the second table value obtained; and
      
      computing a second variable value by subtracting the first quantized element value from a second quantized element value of the quantized element values.
  - 8. The method of claim 7, wherein the encoded form is a binary encoding of the quantized element values of the second vector, each 1-bit value in the binary encoding corresponding to a respective quantized element value of the quantized element values, and further wherein the computing further includes:
    - computing the dot product value by adding the first variable value to a product of the second variable value and the first table value obtained.
  - 9. The method of claim 7, wherein the encoded form is a binary encoding of the quantized element values of the second vector, each 2-bit value in the binary encoding corresponding to a respective quantized element value of the quantized element values, the first index corresponds to lower order bits from the binary encoding, and further wherein the computing further includes:
    - computing a third variable value by subtracting the first quantized element value from a third quantized element value of the quantized element values;
      
      computing a fourth variable value by adding the first quantized element value to a fourth quantized element value of the quantized element values, subtracting the second quantized element value from the fourth quantized element value, and subtracting the third quantized element value from the fourth quantized element value;
      
      obtaining a third table value via a third index;
      
      obtaining a fourth table value via a fourth index; and
      
      computing the dot product value by summing;
      
      the first variable value;
      
      a first product value computed by multiplying the second variable value and the first table value obtained;
      
      a second product value computed by multiplying the third variable value and the third table value obtained; and
      
      a third product value computed by multiplying the fourth variable value and the fourth table value obtained.
  - 10. The method of claim 9, wherein the third index corresponds to higher order bits from the binary encoding and the fourth index corresponds to a result of a bitwise and operation between the higher order bits and the lower order bits and further wherein the computing of the fourth variable, the obtaining of the fourth table value, and the summing and the computing of the third product value are each omitted, in an event a total number of quantized element values is three instead of four.

11. An apparatus configured to perform speech recognition processing, the apparatus comprising:
- by a processor;
  
  an obtaining unit to obtain a table value, from a plurality of table values each corresponding to a unique summation of vector element values of a first vector, via an index corresponding to an encoded form of a combination of quantized element values of a second vector; and
  
  a computation unit to compute a dot product value of the first and second vectors using the table value obtained, the vector element values, and the quantized element values, the processor using fewer mathematical operations to compute the dot product value relative to a standard dot product computation of the first and second vectors, the speech recognition processing performing multiple dot product computations.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The apparatus of claim 11, further comprising an audio interface and wherein the first vector is a feature vector and the vector element values correspond to speech segments extracted from an audio stream for the speech recognition processing, the audio stream received via the audio interface.
  - 13. The apparatus of claim 11, further comprising, by the processor:
    - a feature Maximum Mutual Information (fMMI) unit, and the quantized element values of the second vector correspond to quantized coefficients of a matrix computed by the fMMI unit.
  - 14. The apparatus of claim 11, further comprising, by the processor:
    - a pre-computation unit to pre-compute a table including the plurality of table values and use the pre-computed table multiple times for a subset of the multiple dot product computations performed.
  - 15. The apparatus of claim 14, wherein the quantized element values of the second vector correspond to consecutive matrix elements of a given row of a plurality of rows of a given block matrix of an fMMI matrix, and wherein the processor is further configured to reuse the pre-computed table multiple times, once for each row of the given block matrix.
  - 16. The apparatus of claim 11, wherein the table value is a first table value, the table value obtained is a first table value obtained, the index is a first index, and wherein the computation unit is further configured to:
    - obtain a second table value from the plurality of table values via a second index, the second table value corresponding to a sum value of each vector element value of the vector element values, the second index being a pre-determined index;
      
      compute a first variable value by multiplying a first quantized element value of the quantized element values with the second table value obtained; and
      
      compute a second variable value by subtracting the first quantized element value from a second quantized element value of the quantized element values.
  - 17. The apparatus of claim 16, wherein the encoded form is a binary encoding of the quantized element values of the second vector, each 1-bit value in the binary encoding corresponding to a respective quantized element value of the quantized element values, and further wherein the computation unit is further configured to:
    - compute the dot product value by adding the first variable value to a product of the second variable value and the first table value obtained.
  - 18. The apparatus of claim 16, wherein the encoded form is a binary encoding of the quantized element values of the second vector, each 2-bit value in the binary encoding corresponding to a respective quantized element value of the quantized element values, the first index corresponds to lower order bits from the binary encoding, and further wherein the computation unit is further configured to:
    - compute a third variable value by subtracting the first quantized element value from a third quantized element value of the quantized element values;
      
      compute a fourth variable value by adding the first quantized element value to a fourth quantized element value of the quantized element values, subtracting the second quantized element value from the fourth quantized element value, and subtracting the third quantized element value from the fourth quantized element value;
      
      obtain a third table value via a third index;
      
      obtain a fourth table value via a fourth index; and
      
      compute the dot product value by summing;
      
      the first variable value;
      
      a first product value computed by multiplying the second variable value and the first table value obtained;
      
      a second product value computed by multiplying the third variable value and the third table value obtained; and
      
      a third product value computed by multiplying the fourth variable value and the fourth table value obtained.
  - 19. The apparatus of claim 18, wherein the third index corresponds to higher order bits from the binary encoding and the fourth index corresponds to a result of a bitwise and operation between the higher order bits and the lower order bits and further wherein the computing of the fourth variable, the obtaining of the fourth table value, and the summing and the computing of the third product value are each omitted, in an event a total number of quantized element values is three instead of four.

20. A non-transitory computer-readable medium having encoded thereon a sequence of instructions which, when executed by a processor, causes the processor to:
- obtain a table value, from a plurality of table values each corresponding to a unique summation of vector element values of a first vector, via an index corresponding to an encoded form of a combination of quantized element values of a second vector; and
  
  compute a dot product value of the first and second vectors using the table value obtained, the vector element values, and the quantized element values, the processor using fewer mathematical operations to compute the dot product value relative to a standard dot product computation of the first and second vectors, the processor performing multiple dot product computations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Vlietinck, Jan, Kanthak, Stephan

Granted Patent

US 9,792,910 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/144   Training of HMMs

G10L 15/28   Constructional details of s...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/085   Methods for reducing search...

METHOD AND APPARATUS FOR IMPROVING SPEECH RECOGNITION PROCESSING PERFORMANCE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR IMPROVING SPEECH RECOGNITION PROCESSING PERFORMANCE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links