System and method of pattern recognition in very high-dimensional space
First Claim
1. A method of recognizing a received phoneme using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising:
- (A) training the class phonemes, the training comprising, for each class phoneme;
(1) determining a phoneme vector as a time-frequency representation of the class phoneme;
(2) dividing the phoneme vector into phoneme segments;
(3) assigning each phoneme segment into a plurality of phoneme parameters;
(4) expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters;
(5) transforming the expanded stored-phoneme vector into an orthogonal form using singular-value decomposition wherein;
[x1x2 . . . xm=[u1 u2 . . . um[Λ
Vt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively; and
(B) recognizing the received phoneme by;
(1) receiving an analog acoustic signal;
(2) converting the analog acoustic signal into a digital signal;
(3) determining a received-signal vector as a time-frequency representation of the received digital signal;
(4) dividing the received-signal vector into received-signal segments;
(5) assigning each received-signal segment into a plurality of received-signal parameters;
(6) expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector, (7) transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition wherein;
[yk]=[zk] Λ
Vt, where yk is a kth acoustic vector for a corresponding received phoneme, zk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively;
(8) determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectively with each orthogonal form of the expanded stored-phoneme vectors; and
(9) recognizing the received phoneme according to a comparison of the first distance with the second distance.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
-
Citations
36 Claims
-
1. A method of recognizing a received phoneme using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising:
-
(A) training the class phonemes, the training comprising, for each class phoneme;
(1) determining a phoneme vector as a time-frequency representation of the class phoneme;
(2) dividing the phoneme vector into phoneme segments;
(3) assigning each phoneme segment into a plurality of phoneme parameters;
(4) expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters;
(5) transforming the expanded stored-phoneme vector into an orthogonal form using singular-value decomposition wherein;
[x1x2 . . . xm=[u1 u2 . . . um[Λ
Vt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively; and
(B) recognizing the received phoneme by;
(1) receiving an analog acoustic signal;
(2) converting the analog acoustic signal into a digital signal;
(3) determining a received-signal vector as a time-frequency representation of the received digital signal;
(4) dividing the received-signal vector into received-signal segments;
(5) assigning each received-signal segment into a plurality of received-signal parameters;
(6) expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector, (7) transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition wherein;
[yk]=[zk] Λ
Vt, where yk is a kth acoustic vector for a corresponding received phoneme, zk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively;
(8) determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectively with each orthogonal form of the expanded stored-phoneme vectors; and
(9) recognizing the received phoneme according to a comparison of the first distance with the second distance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29)
-
-
16. A method of recognizing speech patterns, the method using stored phonemes, the method comprising:
-
converting each stored phoneme into n-dimensional space having a center, sampling speech patterns to obtain at least one sampled phoneme;
converting each of the at least one sampled phonemes into the n-dimensional space; and
comparing a distance from the center of the n-dimensional space to the sampled phoneme with a distance from the center of the n-dimensional space to each of the phonemes of the converted plurality of phonemes.
-
-
25. A method of recognizing speech using a database of stored phonemes converted into n-dimensional space, the method comprising:
-
receiving a received phoneme;
converting the received phoneme to n-dimensional space;
comparing the received phoneme to each of the stored phonemes in n-dimensional space; and
recognizing the received phoneme according the comparison of the received phoneme to each of the stored phonemes.
-
-
30. A system for recognizing phonemes, the system using a database of stored phonemes for comparison with received phonemes, the stored phonemes having been converted into n-dimensional space, the system comprising:
-
a recording element that receives a phoneme;
a computer that converts the received phoneme into n-dimensional space, wherein the computer compares in the n-dimensional space the received phoneme with each phoneme in the database of stored phonemes. - View Dependent Claims (31, 32, 33, 34)
-
-
35. A medium storing a program for instructing a computer device to recognize a received speech signal using a database of stored phonemes converted into n-dimensional space, the program comprising instructing the computer device to perform the following steps:
-
receiving a received phoneme;
converting the received phoneme to n-dimensional space;
comparing the received phoneme to each of the stored phonemes in n-dimensional space; and
recognizing the received phoneme according the comparison of the received phoneme to each of the stored phonemes.
-
-
36. A medium storing a program for instructing a computer device to recognize a received speech signal using a database of stored phonemes converted into n-dimensional space, the database of stored phonemes formed by training the stored phonemes according to the following steps:
-
(1) determining a phoneme vector as a time-frequency representation of the stored phoneme;
(2) dividing the phoneme vector into phoneme segments;
(3) assigning each phoneme segment into a plurality of phoneme parameters;
(4) expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters;
(5) transforming the expanded stored-phoneme vector into an orthogonal form using singular-value decomposition wherein;
[x1x2 . . . xm]=[u1u2 . . . um]Λ
Vt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively, the program stored on the medium instructing the computer device to perform the following steps;
(1) receiving an analog acoustic signal;
(2) converting the analog acoustic signal into a digital signal;
(3) determining a received-signal vector as a time-frequency representation of the received digital signal;
(4) dividing the received-signal vector into received-signal segments;
(5) assigning each received-signal segment into a plurality of received-signal parameters;
(6) expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector, (7) transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition wherein;
[yk]=[zk]Λ
Vt, where yk is a kth acoustic vector for a corresponding received phoneme, Zk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively;
(8) determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectivelywith each orthogonal form of the expanded stored-phoneme vectors; and
(9) recognizing the received phoneme according to a comparison of the first distance with the second distance.
-
Specification