Method and system for Gaussian probability data bit reduction and computation

US 7,970,613 B2
Filed: 11/12/2005
Issued: 06/28/2011
Est. Priority Date: 11/12/2005
Status: Active Grant

First Claim

Patent Images

1. A speech recognition apparatus, comprising:

a signal processor configured to observe N different features of an observed speech signal and set up M different probability distribution functions of the N different observable features, each probability distribution function representing a probability of a different one of M possible Gaussians of a portion of the observed speech signal, wherein each Gaussian is characterized by a corresponding uncompressed mean a corresponding uncompressed variancewherein the signal processor is configured to process the observed signal to determine the observable features for a time window and represent the one or more different states of the features with the M Gaussian probability distribution functions, wherein the uncompressed mean and variance values are represented by α

-bit floating point numbers, where α

is an integer greater than 1;

wherein the signal processor is configured to convert the probability distribution functions to compressed probability functions having compressed mean and/or variance values represented as β

-bit integers, where β

is less than α

, whereby the compressed mean and/or variance values occupy less memory than the uncompressed mean and/or variance values,wherein the signal processor is configured to calculate a probability for each of the M possible Gaussians using the compressed probability functions wherein each compressed mean value is equal to a function of a quantity, wherein the quantity is product of a difference between the uncompressed variance and a centroid of the means for a given observable feature for all possible Gaussians with a variance for the given observable feature for all possible Gaussians, wherein the function is equal to 2^β−

1, if the quantity is greater than 2^β−

1, wherein the function is equal to −

(2^β−

1) if the quantity is less than −

(2^β−

1), and wherein the function is equal to a fixed point representation of the quantity otherwise,wherein the signal processor is configured to determine a most probable state from the calculated probabilities for the M possible Gaussians, andwherein the signal processor is configured to recognize a recognizable pattern within the observed speech signal for the time window using the most probable state.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Use of runtime memory may be reduced in a data processing algorithm that uses one or more probability distribution functions. Each probability distribution function may be characterized by one or more uncompressed mean values and one or more variance values. The uncompressed mean and variance values may be represented by α-bit floating point numbers, where α is an integer greater than 1. The probability distribution functions are converted to compressed probability functions having compressed mean and/or variance values represented as β-bit integers, where β is less than α, whereby the compressed mean and/or variance values occupy less memory space than the uncompressed mean and/or variance values. Portions of the data processing algorithm can be performed with the compressed mean and variance values.

Citations

34 Claims

1. A speech recognition apparatus, comprising:
- a signal processor configured to observe N different features of an observed speech signal and set up M different probability distribution functions of the N different observable features, each probability distribution function representing a probability of a different one of M possible Gaussians of a portion of the observed speech signal, wherein each Gaussian is characterized by a corresponding uncompressed mean a corresponding uncompressed variancewherein the signal processor is configured to process the observed signal to determine the observable features for a time window and represent the one or more different states of the features with the M Gaussian probability distribution functions, wherein the uncompressed mean and variance values are represented by α
  
  -bit floating point numbers, where α
  
  is an integer greater than 1;
  
  wherein the signal processor is configured to convert the probability distribution functions to compressed probability functions having compressed mean and/or variance values represented as β
  
  -bit integers, where β
  
  is less than α
  
  , whereby the compressed mean and/or variance values occupy less memory than the uncompressed mean and/or variance values,wherein the signal processor is configured to calculate a probability for each of the M possible Gaussians using the compressed probability functions wherein each compressed mean value is equal to a function of a quantity, wherein the quantity is product of a difference between the uncompressed variance and a centroid of the means for a given observable feature for all possible Gaussians with a variance for the given observable feature for all possible Gaussians, wherein the function is equal to 2^β−
  
  1, if the quantity is greater than 2^β−
  
  1, wherein the function is equal to −
  
  (2^β−
  
  1) if the quantity is less than −
  
  (2^β−
  
  1), and wherein the function is equal to a fixed point representation of the quantity otherwise,wherein the signal processor is configured to determine a most probable state from the calculated probabilities for the M possible Gaussians, andwherein the signal processor is configured to recognize a recognizable pattern within the observed speech signal for the time window using the most probable state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The apparatus of claim 1 wherein the centroid of the means for a given observable feature for all possible Gaussians is equal to a sum of the uncompressed means for the M Gaussians divided by M.
  - 3. The apparatus of claim 1 wherein the variance for a given observable feature for all possible Gaussians is equal to an integer global scaling factor multiplied by a square root, wherein the square root is a square root of a quotient, wherein the quotient is a sum divided by M, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature.
  - 4. The apparatus of claim 1 wherein converting the probability distribution functions to compressed probability functions includes compressing values for the one or more observable features from α
    - -bit floating point type to compressed observable feature values of β
      
      -bit integer type.
  - 5. The apparatus of claim 4 wherein each compressed observable feature value is equal to a function of a quantity, wherein the quantity is a product of a difference between a given observable feature value and the centroid of the means for a given observable feature for all possible Gaussians with the variance for the given observable feature for all possible Gaussians, wherein the function is equal to 2^β
    - −
      
      1, if the quantity is greater than 2^β−
      
      1, wherein the function is equal to −
      
      (2^β−
      
      1) if the quantity is less than −
      
      (2^β−
      
      1), and wherein the function is equal to a fixed point representation of the quantity otherwise.
  - 6. The apparatus of claim 5 wherein the centroid of the means for a given observable feature for all possible Gaussians is equal to a sum of the uncompressed means for the M Gaussians divided by M.
  - 7. The apparatus of claim 5 wherein the variance for a given observable feature for all possible Gaussians is equal to an integer global scaling factor chosen to optimize recognition accuracy multiplied by a square root, wherein the square root is a square root of a quotient, wherein the quotient is a sum divided by M, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature.
  - 8. The apparatus of claim 1 wherein each compressed variance for a given Gaussian of a given observable feature is equal to a function of an inverse of a first product, wherein the first product is a product of 2 with a corresponding uncompressed variance and a factor, wherein the factor is equal to a sum divided by a second product, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature where the second product is a product of M and a global scaling factor chosen to optimize recognition accuracy, wherein the function is equal to 2^β
    - −
      
      1, if the quantity is greater than 2^β−
      
      1, wherein the function is equal to −
      
      (2^β−
      
      1) if the quantity is less than −
      
      (2^β−
      
      1), and wherein the function is equal to a fixed point representation of the quantity otherwise.
  - 9. The apparatus of claim 8 wherein the global scaling factor chosen to optimize recognition accuracy is equal to about 30.
  - 10. The apparatus of claim 8 wherein the global scaling factor chosen to optimize recognition accuracy is the argmin of the sum over all M Gaussians of the sum over all features of the integral with respect to each given feature of the difference between the Gaussian probability distribution functions and the compressed probability distribution functions.
  - 11. The apparatus of claim 1 wherein each compressed probability distribution function includes an exponential of a product of a universal scalar and a sum, wherein the sum is a sum over all features of a product of a square of a difference multiplied by a compressed variance value for an observable feature, wherein the difference is a difference between a compressed value of an observable feature and a corresponding compressed mean value for the observable feature, wherein the universal scaling factor is equal to −
    - 2 divided by the cube of a global scaling factor chosen to optimize recognition accuracy, wherein the compressed variance value is equal to a function of an inverse of a first product, wherein the first product is a product of 2 with a corresponding uncompressed variance and a factor, wherein the factor is equal to a sum divided by a second product, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature where the second product is a product of M and the global scaling factor chosen to optimize recognition accuracy, wherein the function is equal to 2^β−
      
      1, if the quantity is greater than 2^β−
      
      1, wherein the function is equal to −
      
      (2^β−
      
      1) if the quantity is less than −
      
      (2^β−
      
      1), and wherein the function is equal to a fixed point representation of the inverse of the first product otherwise.
  - 12. The apparatus of claim 11, further comprising, a value of the global scaling factor has been empirically determined to optimize recognition accuracy.
  - 13. The method of claim 1 wherein the recognizable pattern is a most likely, phoneme or word for the time window.

14. An apparatus for reducing use of runtime memory in a data processing algorithm that uses one or more Gaussian probability distribution functions for one or more different states of features x_ithat make up portions of an observed speech signal, wherein the Gaussian probability distribution functions include M Gaussian functions of N different observable features, each Gaussian function representing the probability distribution for a different one of M possible Gaussians, each Gaussian function being characterized by an uncompressed mean and an uncompressed variance, the apparatus comprising:
- means for processing the observed speech signal to determine the observable features for a time window;
  
  means for representing the one or more different states of the features with the M Gaussian probability distribution functions, wherein the uncompressed mean and variance values are represented by α
  
  -bit floating point numbers, where α
  
  is an integer greater than 1;
  
  means for converting the probability distribution functions to compressed probability functions having compressed mean and/or variance values represented as β
  
  -bit integers, where β
  
  is less than α
  
  , whereby the compressed mean and/or variance values occupy less memory than the uncompressed mean and/or variance values, wherein each compressed mean value is equal to a function of a quantity, wherein the quantity is product of a difference between the uncompressed variance and a centroid of the means for a given observable feature for all possible Gaussians with a variance for the given observable feature for all possible Gaussians, wherein the function is equal to 2^β−
  
  1, if the quantity is greater than 2^β−
  
  1, wherein the function is equal to −
  
  (2^β−
  
  1) if the quantity is less than −
  
  (2^β−
  
  1), and wherein the function is equal to a fixed point representation of the quantity otherwise; and
  
  means for determining a most likely state of the features from the M Gaussian functions with the compressed mean and variance values; and
  
  means for recognizing a recognizable pattern within the observed speech signal for the time window using the most likely state.

15. A speech signal recognition method, comprising:
- observing N different features of an observed speech signal representing a real-world process;
  
  setting up M different probability distribution functions of the N different observable features with a signal processor, wherein each probability distribution function represents a probability of a different one of M possible Gaussians of a portion of the observed speech signal, wherein each Gaussian is characterized by a corresponding uncompressed mean a corresponding uncompressed variance,processing the observed speech signal with the signal processor to determine the observable features for a time window and represent the one or more different states of the features with the M Gaussian probability distribution functions, wherein the uncompressed mean and variance values are represented by α
  
  -bit floating point numbers, where α
  
  is an integer greater than 1;
  
  converting the probability distribution functions with the signal processor to compressed probability functions having compressed mean and/or variance values represented as β
  
  -bit integers, where β
  
  is less than α
  
  , whereby the compressed mean and/or variance values occupy less memory than the uncompressed mean and/or variance values,calculating a probability for each of the M possible Gaussians with the signal processor using the compressed probability functions wherein each compressed mean value is equal to a function of a quantity, wherein the quantity is product of a difference between the uncompressed variance and a centroid of the means for a given observable feature for all possible Gaussians with a variance for the given observable feature for all possible Gaussians,wherein the function is equal to 2^β−
  
  1, if the quantity is greater than 2^β−
  
  1, wherein the function is equal to −
  
  (2^β−
  
  1) if the quantity is less than −
  
  (2^β−
  
  1), and wherein the function is equal to a fixed point representation of the quantity otherwise,determining a most probable state with the processor from the calculated probabilities for the M possible Gaussians; and
  
  recognizing a recognizable pattern within the observed speech signal with the signal processor for the time window using the most probable state.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 16. The method of claim 15 wherein converting the probability distribution functions to compressed probability functions includes compressing values for one or more observable features from α
    - -bit floating point type to β
      
      -bit integer type.
  - 17. The method of claim 15 wherein α
    - is 32 and β
      
      is 8.
  - 18. The method of claim 15 wherein the centroid of the means for a given observable feature for all possible Gaussians is equal to a sum of the uncompressed means for the M Gaussians divided by M.
  - 19. The method of claim 15 wherein the variance for a given observable feature for all possible Gaussians is equal to an integer global scaling factor chosen to optimize recognition accuracy multiplied by a square root, wherein the square root is a square root of a quotient, wherein the quotient is a sum divided by M, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians.
  - 20. The method of claim 15 wherein converting the probability distribution functions to compressed probability functions includes compressing values for the one or more observable features from α
    - -bit floating point type to compressed observable feature values of β
      
      -bit integer type.
  - 21. The method of claim 20 wherein each compressed observable feature value is equal to a function of a quantity, wherein the quantity is a product of a difference between a given observable feature value and the centroid of the means for a given observable feature for all possible Gaussians with the variance for the given observable feature for all possible Gaussians, wherein the function is equal to 2^β
    - −
      
      1, if the quantity is greater than 2^β−
      
      1, wherein the function is equal to −
      
      (2^β−
      
      1) if the quantity is less than −
      
      (2^β−
      
      1), and wherein the function is equal to a fixed point representation of the quantity otherwise.
  - 22. The method of claim 21 wherein the centroid of the means for a given observable feature for all possible Gaussians is equal to a sum of the uncompressed means for the M Gaussians divided by M.
  - 23. The method of claim 21 wherein the variance for a given observable feature for all possible Gaussians is equal to an integer global scaling factor chosen to optimize recognition accuracy multiplied by a square root, wherein the square root is a square root of a quotient, wherein the quotient is a sum divided by M, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature.
  - 24. The method of claim 15 wherein each compressed variance for a given Gaussian of a given observable feature is equal to a function of an inverse of a first product, wherein the first product is a product of 2 with a corresponding uncompressed variance and a factor, wherein the factor is equal to a sum divided by a second product, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature where the second product is a product of M and a global scaling factor chosen to optimize recognition accuracy, wherein the function is equal to 2^β
    - −
      
      1, if the quantity is greater than 2^β−
      
      1, wherein the function is equal to −
      
      (2^β−
      
      1) if the quantity is less than −
      
      (2^β−
      
      1), and wherein the function is equal to a fixed point representation of the quantity otherwise.
  - 25. The method of claim 24 wherein the global scaling factor chosen to optimize recognition accuracy is equal to about 30.
  - 26. The method of claim 24 wherein the global scaling factor chosen to optimize recognition accuracy is the argmin of the sum over all M Gaussians of the sum over all features of the integral with respect to each given feature of the difference between the Gaussian probability distribution functions and the compressed probability distribution functions.
  - 27. The method of claim 21 wherein each compressed probability distribution function includes an exponential of a product of a universal scalar and a sum, wherein the sum is a sum over all features of a product of a square of a difference multiplied by a compressed variance value for an observable feature, wherein the difference is a difference between a compressed value of an observable feature and a corresponding compressed mean value for the observable feature, wherein the universal scaling factor is equal to −
    - 2 divided by the cube of a global scaling factor chosen to optimize recognition accuracy, wherein the compressed variance value is equal to a function of an inverse of a first product, wherein the first product is a product of 2 with a corresponding uncompressed variance and a factor, wherein the factor is equal to a sum divided by a second product, wherein the sum is a sum of inverses of the uncompressed variances for the M Gaussians for the given observable feature where the second product is a product of M and the global scaling factor chosen to optimize recognition accuracy, wherein the function is equal to 2^β−
      
      1, if the quantity is greater than 2^β−
      
      1, wherein the function is equal to −
      
      (2^β−
      
      1) if the quantity is less than −
      
      (2^β−
      
      1), and wherein the function is equal to a fixed point representation of the inverse of the first product.
  - 28. The method of claim 27, further comprising, empirically determining a value of the global scaling factor that optimizes recognition accuracy.
  - 29. The method of claim 27 wherein the global scaling factor is an integer between 1 and 100.
  - 30. The method of claim 27 wherein α
    - =32, β
      
      =8 and the global scaling factor is equal to 30.
  - 31. The method of claim 15 wherein determining the most probable state includes selecting a maximum probability from among the probabilities for the M possible Gaussians.
  - 32. The method of claim 15 wherein observing the N different features includes converting a time domain signal to a cepstrum characterized by a plurality of cepstral coefficients wherein the N features include selected cepstral coefficients.
  - 33. The method of claim 15 wherein the recognition method is a speech recognition method or optical character recognition method.
  - 34. The method of claim 15 wherein the recognition method uses a hidden Markov model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Interactive Entertainment Inc. (Sony Group Corp.)
Original Assignee
Sony Computer Entertainment Incorporated (Sony Group Corp.)
Inventors
Chen, Ruxin
Primary Examiner(s)
Dorvil; Richemond
Assistant Examiner(s)
Adesanya; Olujimi A.

Application Number

US11/273,223
Publication Number

US 20070112566A1
Time in Patent Office

2,054 Days
Field of Search

704/240, 704/245, 704/254
US Class Current

704/256
CPC Class Codes

G10L 15/14 using statistical models, e...

G10L 15/285 Memory allocation or algori...

Method and system for Gaussian probability data bit reduction and computation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for Gaussian probability data bit reduction and computation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links