System and method of pattern recognition in very high-dimensional space
First Claim
1. A method of recognizing a received phoneme using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising:
- (A) training the class phonemes, the training comprising, for each class phoneme;
(1) determining a phoneme vector as a time-frequency representation of the class phoneme;
(2) dividing the phoneme vector into phoneme segments;
(3) assigning each phoneme segment into a plurality of phoneme parameters;
(4) expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters;
(5) transforming the expanded stored-phoneme vector into an orthogonal form using singular-value decomposition wherein;
[x1 x2 . . . xm]=[u1 u2 . . . um]Λ
Vt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively; and
(B) recognizing the received phoneme by;
(1) receiving an analog acoustic signal;
(2) converting the analog acoustic signal into a digital signal;
(3) determining a received-signal vector as a time-frequency representation of the received digital signal;
(4) dividing the received-signal vector into received-signal segments;
(5) assigning each received-signal segment into a plurality of received-signal parameters;
(6) expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector,(7) transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition wherein;
[yk]=[zk]Λ
Vt, where yk is a kth acoustic vector for a corresponding received phoneme, zk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively;
(8) determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectively with each orthogonal form of the expanded stored-phoneme vectors; and
(9) recognizing the received phoneme according to a comparison of the first distance with the second distance.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
168 Citations
33 Claims
-
1. A method of recognizing a received phoneme using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising:
-
(A) training the class phonemes, the training comprising, for each class phoneme; (1) determining a phoneme vector as a time-frequency representation of the class phoneme; (2) dividing the phoneme vector into phoneme segments; (3) assigning each phoneme segment into a plurality of phoneme parameters; (4) expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters; (5) transforming the expanded stored-phoneme vector into an orthogonal form using singular-value decomposition wherein; [x1 x2 . . . xm]=[u1 u2 . . . um]Λ
Vt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively; and(B) recognizing the received phoneme by; (1) receiving an analog acoustic signal; (2) converting the analog acoustic signal into a digital signal; (3) determining a received-signal vector as a time-frequency representation of the received digital signal; (4) dividing the received-signal vector into received-signal segments; (5) assigning each received-signal segment into a plurality of received-signal parameters; (6) expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector, (7) transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition wherein; [yk]=[zk]Λ
Vt, where yk is a kth acoustic vector for a corresponding received phoneme, zk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively;(8) determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectively with each orthogonal form of the expanded stored-phoneme vectors; and (9) recognizing the received phoneme according to a comparison of the first distance with the second distance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method of recognizing speech patterns, the method using stored phonemes, the method comprising:
-
converting each stored phoneme into n-dimensional space having a center, sampling speech patterns to obtain at least one sampled phoneme; converting each of the at least one sampled phonemes into the n-dimensional space; and comparing a distance from the center of the n-dimensional space to the sampled phoneme with a distance from the center of the n-dimensional space to each of the phonemes of the converted plurality of phonemes. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method of recognizing speech using a database of stored phonemes converted into n-dimensional space, the method comprising:
-
receiving a received phoneme; converting the received phoneme to n-dimensional space; comparing the received phoneme to each of the stored phonemes in n-dimensional space by comparing a first distance from a center of the n-dimensional space to a first point associated with the received phoneme with a second distance from the center of the n-dimensional space to a second point associated in turn with each of the stored phonemes; and recognizing the received phoneme according the comparison of the received phoneme to each of the stored phonemes. - View Dependent Claims (26, 27, 28)
-
-
29. A system for recognizing phonemes, the system using a database of stored phonemes for comparison with received phonemes, the stored phonemes having been converted into n-dimensional space, the system comprising:
-
a recording element that receives a phoneme; a computer that; converts the received phoneme into n-dimensional space, wherein the computer compares in the n-dimensional space the received phoneme with each phoneme in the database of stored phonemes by comparing a first distance from a center of the n-dimensional space to a first point associated with the received phoneme with a second distance from the center of the n-dimensional space to a second point associated with each respective stored phoneme from the database of stored phonemes; and recognizes the received phoneme using the comparison in the n-dimensional space of the received phoneme with each phoneme in the database of stored phonemes. - View Dependent Claims (30, 31)
-
-
32. A medium storing a program for instructing a computer device to recognize a received speech signal using a database of stored phonemes converted into n-dimensional space, the program comprising instructing the computer device to perform the following steps:
-
receiving a received phoneme; converting the received phoneme to n-dimensional space; comparing the received phoneme to each of the stored phonemes in n-dimensional space by comparing a first distance from a center of the n-dimensional space to a first point associated with the received phoneme with a second distance from the center of the n-dimensional space to a second point associated with each respective stored phoneme from the database of stored phonemes; and recognizing the received phoneme according to the comparison of the received phoneme to each of the stored phonemes.
-
-
33. A medium storing a program for instructing a computer device to recognize a received speech signal using a database of stored phonemes converted into n-dimensional space, the database of stored phonemes formed by training the stored phonemes according to the following steps:
-
(1) determining a phoneme vector as a time-frequency representation of the stored phoneme; (2) dividing the phoneme vector into phoneme segments; (3) assigning each phoneme segment into a plurality of phoneme parameters; (4) expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters; (5) transforming the expanded stored-phoneme vector into an orthogonal from using singular-value decomposition wherein; [x1 x2 . . . xm]=[u1 u2 . . . um]Λ
Vt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively, the program stored on the medium instructing the computer device to perform the following steps;(1) receiving an analog acoustic signal; (2) converting the analog acoustic signal into a digital signal; (3) determining a received-signal vector as a time-frequency representation of the received digital signal; (4) dividing the received-signal vector into received-signal segments; (5) assigning each received-signal segment into a plurality of received-signal parameters; (6) expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector, (7) transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition wherein; [yk]=[zk]Λ
Vt, where yk is a kth acoustic vector for a corresponding received phoneme, Zk is the corresponding orthogonal vector and Λ and
V are diagonal and unitary matrices, respectively;(8) determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectively with each orthogonal form of the expanded stored-phoneme vectors; and (9) recognizing the received phoneme according to a comparison of the first distance with the second distance.
-
Specification