Sub-audible speech recognition based upon electromyographic signals
First Claim
1. A method for training and using a system to identify a sub-audible signal formed by a source of sub-audible sounds, the method comprising providing a computer that is programmed to execute, and does execute, the following actions:
- ;
(1) receiving R signal sequences, numbered r=1, . . . , R (R≧
2), with each sequence comprising an instance of a sub-audible speech pattern (“
SASP”
), uttered by a user, and each SASP including at least one word drawn from a selected database of Q words, numbered q=1, . . . , Q with Q≧
2;
(2) estimating where each of the R SASPs begins and ends in the sequences;
for each of the signal sequences, numbered r=1, . . . , R;
(3) providing signal values of a received signal, number r, within a temporal window having a selected window width Δ
t(win); and
(4) transforming each of the R SASPs, using a Signal Processing Transform (“
SPT”
) operation to obtain an SPT value that is expressed in terms of at least first and second transform parameters comprising at least a signal frequency and a signal energy associated with the SASP;
(5) providing a first matrix M with first matrix entries equal to the SPT values for the R SASPs, ordered according to the at least first and second transform parameters along a first matrix axis and along a second matrix axis, respectively, of the matrix M;
(6) tessellating the matrix M into a sequence of exhaustive and mutually exclusive cells of matrix entries, referred to as M-cells, with each M-cell containing a collection of contiguous matrix entries, where each M-cell is characterized according to at least one selected M-cell criterion;
(7) providing, for each M-cell, an M-cell representative value, depending upon at least one of the first matrix entries within the M-cell;
(8) formatting the M-cell representative values as a vector V with vector entry values vk(q;
r), numbered k=1,. . . , K (K≧
2);
(9) analyzing the vector entry values vk(q;
r) using a neural net classifier, having a neural net architecture, and a sequence of estimated weight coefficient values associated with at least one of the neural net classifier layers, where the neural net classifier provides a sequence of output values dependent upon the weight coefficient values and upon the vector entry values vk(q;
r);
(10) receiving the vector entries vk(q;
r) and forming a first sum
S1(q;
r)h=Σ
k W1,k,h(q;
r)·
vk(q;
r),where {w1,k,h(q;
r)}·
is a first selected set of adjustable weight coefficients that are estimated by a neural net procedure;
(11) forming a first activation function A1{S1(q;
r)h}, that is monotonically increasing as the value S1(q;
r)h increases;
(12) forming a second sum
S2(q;
r)g=Σ
h w2,h,g(q;
r)·
A1{ S1(q;
r)h} (g =1, . . . , G;
G≧
1),where w2,h,g(q;
r)·
is a second selected set of adjustable weight coefficients that are estimated by the neural net procedure;
(13) forming a second activation function A2 {S2(q;
r)g} that depends upon the second sum S2(q;
r), that is monotonically increasing as the value S2(q;
r) increases;
(14) providing a set of reference output values {A(q;
ref)g} as an approximation for the sum A2 {S2(q,r)g} for the R instances of the SASP;
(15) forming a difference Δ
1(q)=(1/R·
G) Σ
r,g|A2{S2(q;
r)g}−
A](q;
ref)g|p1, where p1 is a selected positive exponent;
(16) comparing the difference Δ
1(q) with a selected threshold value ε
(thr;
1);
(17) when Δ
1(q)[[>
]] is greater than ε
(thr;
1), adjusting at least one of the weight coefficients w1,k,h(q;
r) and the weight coefficients w2,h,g(q;
r), returning to step (10), and repeating the procedures of steps (10)-(16); and
(18) when Δ
1(q) is no greater than ε
(thr;
1), interpreting this condition as indicating that at least one of an optimum first set of weight coefficients {w1,k,h(q;
r;
opt)} and an optimum second set of weight coefficients {w2,h,g(q;
r;
opt)} has been obtained, and using the at least one of the first set and second set of optimum weight coefficients to receive and process a new SASP signal and to estimate whether the received new SASP signal corresponds to a reference word or reference phrase in the selected database.
2 Assignments
0 Petitions
Accused Products
Abstract
Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns (“SASPs”) for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms (“SPTs”) are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.
-
Citations
18 Claims
-
1. A method for training and using a system to identify a sub-audible signal formed by a source of sub-audible sounds, the method comprising providing a computer that is programmed to execute, and does execute, the following actions:
- ;
(1) receiving R signal sequences, numbered r=1, . . . , R (R≧
2), with each sequence comprising an instance of a sub-audible speech pattern (“
SASP”
), uttered by a user, and each SASP including at least one word drawn from a selected database of Q words, numbered q=1, . . . , Q with Q≧
2;(2) estimating where each of the R SASPs begins and ends in the sequences; for each of the signal sequences, numbered r=1, . . . , R; (3) providing signal values of a received signal, number r, within a temporal window having a selected window width Δ
t(win); and(4) transforming each of the R SASPs, using a Signal Processing Transform (“
SPT”
) operation to obtain an SPT value that is expressed in terms of at least first and second transform parameters comprising at least a signal frequency and a signal energy associated with the SASP;(5) providing a first matrix M with first matrix entries equal to the SPT values for the R SASPs, ordered according to the at least first and second transform parameters along a first matrix axis and along a second matrix axis, respectively, of the matrix M; (6) tessellating the matrix M into a sequence of exhaustive and mutually exclusive cells of matrix entries, referred to as M-cells, with each M-cell containing a collection of contiguous matrix entries, where each M-cell is characterized according to at least one selected M-cell criterion; (7) providing, for each M-cell, an M-cell representative value, depending upon at least one of the first matrix entries within the M-cell; (8) formatting the M-cell representative values as a vector V with vector entry values vk(q;
r), numbered k=1,. . . , K (K≧
2);(9) analyzing the vector entry values vk(q;
r) using a neural net classifier, having a neural net architecture, and a sequence of estimated weight coefficient values associated with at least one of the neural net classifier layers, where the neural net classifier provides a sequence of output values dependent upon the weight coefficient values and upon the vector entry values vk(q;
r);(10) receiving the vector entries vk(q;
r) and forming a first sum
S1(q;
r)h=Σ
k W1,k,h(q;
r)·
vk(q;
r),where {w1,k,h(q;
r)}·
is a first selected set of adjustable weight coefficients that are estimated by a neural net procedure;(11) forming a first activation function A1{S1(q;
r)h}, that is monotonically increasing as the value S1(q;
r)h increases;(12) forming a second sum
S2(q;
r)g=Σ
h w2,h,g(q;
r)·
A1{ S1(q;
r)h} (g =1, . . . , G;
G≧
1),where w2,h,g(q;
r)·
is a second selected set of adjustable weight coefficients that are estimated by the neural net procedure;(13) forming a second activation function A2 {S2(q;
r)g} that depends upon the second sum S2(q;
r), that is monotonically increasing as the value S2(q;
r) increases;(14) providing a set of reference output values {A(q;
ref)g} as an approximation for the sum A2 {S2(q,r)g} for the R instances of the SASP;(15) forming a difference Δ
1(q)=(1/R·
G) Σ
r,g|A2{S2(q;
r)g}−
A](q;
ref)g|p1, where p1 is a selected positive exponent;(16) comparing the difference Δ
1(q) with a selected threshold value ε
(thr;
1);(17) when Δ
1(q)[[>
]] is greater than ε
(thr;
1), adjusting at least one of the weight coefficients w1,k,h(q;
r) and the weight coefficients w2,h,g(q;
r), returning to step (10), and repeating the procedures of steps (10)-(16); and(18) when Δ
1(q) is no greater than ε
(thr;
1), interpreting this condition as indicating that at least one of an optimum first set of weight coefficients {w1,k,h(q;
r;
opt)} and an optimum second set of weight coefficients {w2,h,g(q;
r;
opt)} has been obtained, and using the at least one of the first set and second set of optimum weight coefficients to receive and process a new SASP signal and to estimate whether the received new SASP signal corresponds to a reference word or reference phrase in the selected database.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
where weight coefficients w′
1,k,h(q″
;
r;
opt) are said optimized first weight values coefficients found for a candidate word or phrase (q″
) in the database;(30) forming a first new word activation function A1′
{S1′
(q′
;
q″
;
r)h} that depends upon the first sum S1′
(q′
;
q″
;
r)h;(31) forming a second sum
S2′
(q′
;
q″
;
r)g=Σ
hw′
2,h,g(q″
;
ropt)·
A1′
{S1′
(q′
;
q″
;
r)h}(g=1, . . . .G;
G≧
1),where weight coefficients w′
2,h,g(q″
;
r)·
are said optimized second weight coefficients found for a candidate word or phrase (q″
) in the database;(32) forming a second new word activation function A2′
{S2′
(q′
;
q″
;
)g} that depends upon the second sum S2′
(q′
;
q″
;
r)h;(33) providing a set of reference output values {A′
(q″
;
ref)g} associated with each candidate word or phrase (q″
) in the database;(34) forming a comparison difference
Δ
1′
(q″
;
q′
)=(1/R·
G)Σ
r,g|A2′
{S2′
(q′
;
q″
;
r)g}−
A′
(q″
;
ref)g|p2,where p2 is a selected positive exponent; (35) comparing the difference Δ
1(q″
;
q′
) with a selected threshold value ε
(thr;
2);(36) when the difference Δ
1(q″
;
q′
)is greater than ε
(thr;
2), returning to step (28) and repeating the procedures of steps (28)-(35) with another candidate word or phrase (q″
) in the database; and(37) when Δ
1(q″
;
q′
)is no greater than ε
(thr;
2), interpreting this condition as indicating that the present candidate word or phrase (q″
) is the “
new”
word (q′
), and indicating that the present candidate word or phrase q″
is likely to be the “
new”
word q′
.
- ;
-
3. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
replacing at least one of said matrix cell features by a normalized feature for each of said cells corresponding to said matrix M.
-
4. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
when at least two distinct words, number q1 and q2, in said database satisfy Δ
1′
(q′
;
q″
=q1)≈
Δ
1′
(q′
;
q″
=q2), and Δ
1′
(q′
;
q1) and Δ
1′
(q′
;
q2) are substantially less than Δ
1′
(q′
q″
) for any word q″
≠
q1 and q″
≠
q2 in said database, and interpreting this condition as indicating that said new word included in said new SASP cannot be unambiguously identified.
-
5. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
choosing said weighting for said weighted points from the group of weighting consisting of (i) substantially uniform weighting and (ii) a weighting that decreases monotonically as said magnitude of said comparison difference increases.
-
6. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
determining said reference set of said weight coefficients to be independent of said word number q in said database.
-
7. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
determining said reference set of said weight coefficients so that at least one reference setnof said weight coefficients so that at least one reference set weight coefficient for a first selected word number q1 in said database differs from a corresponding reference set weight coefficient for a second selected word number q2 in said database.
-
8. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
selecting said window width Δ
t(win) in a range 1-4 sec.
-
9. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
selecting each of said matrix cells to be rectangularity shaped.
-
10. The method of claim 9, wherein said computer is further programmed to execute, and does execute, the following actions:
selecting at least two of said matrix cells to have different sizes.
-
11. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
choosing said SPT operations from the group of SPT operations consisting of (i) a windowed short time interval Fourier Transform (STFT);
(ii) discrete wavelets (DWTs) and continuous wavelets (CWTs) using Daubechies 5 and 7 bases;
(iii) dual tree wavelets (DTWTs) with a near sym_a 5,7 tap filter and a Q-shift 14,14 tap filter;
(iv) Hartley Transform;
(v) Linear Predictive Coding (LPC) coefficients;
(vi) a moving average of a selected number of said sample values with uniform weighting; and
(vii) a moving average of a selected number of said sample values with non-uniform weighting.
-
12. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
selecting said database to include at least one of the words “
stop”
, “
go”
, “
left”
, “
right”
, “
alpha”
, “
omega”
, “
one”
, “
two”
, “
three”
, “
four”
, “
five”
, “
six”
, “
seven”
, “
eight”
, “
nine” and
“
ten”
.
-
13. The method of claim 2, wherein said computer is further programmed to execute, and does execrute, the following actions:
selecting said error threshold number to lie in a range e(thr;
1)≦
0.01.
-
14. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
applying a backpropagation of error method in said neural net classifier analysis of said features of said cells of said matrix M.
-
15. A method for training and using a system to identify a sub-audible signal formed by a source of sub-audible sounds, the method comprising providing a computer that is programmed to execute, and does execute, the following actions:
-
(1) receiving R signal sequences, numbered r=1, . . . , , R(R≧
2), with each sequence comprising an instance of a specified sub-audible speech pattern (“
SASP”
), uttered by the user, and each SASP including at least one word drawn from a selected database of Q words, numbered q=1, . . . , Q (Q≧
2);(2) estimating where each SASP begins and ends for each of the signal sequences; (3) providing signal values of the received signal, number r, within a temporal window having a selected window width Δ
t(win);(4) transforming each of the R SASPs, using an Signal Processing Transform (“
SPT”
) operation to obtain an SPT value that is expressed in terms of at least one transform parameter having a sequence of parameter values, including a signal frequency an a signal energy associated with the SASP;(5) providing a first matrix M with first matrix entries equal to the SPT values for the R SASPs, ordered according to each of the at least first and second transform parameters along a first matrix axis and along a second matrix axis, respectively of the matrix M; (6) tessellating the matrix M into a sequence of exhaustive and mutually exclusive, cells of the matrix entries, referred to as M-cells, with each M-cell containing a collection of contiguous matrix entries, where each M-cell is characterized according to at least one selected M-cell criterion; (7) providing, for each M-cell, an M-cell representative value depending upon at least one of the first matrix entries within the M-cell; (8) formatting the cell representative values as a vector V with vector entry values vk(q;
r) numbered k=1, . . . , K (K≧
2);(9) analyzing the vector entry values vk(q;
r) using a neural net classifier, having a neural net architecture with at least one neural net hidden layer, and a sequence of estimated weight coefficient values wk(q,r) associated with that at least one neural net hidden layer, where the neural net classifier provides a sequence of neural net output values A(q,r), equal to a sum over the index k of each of the vector entry values vk(q,r) multiplied by a corresponding weight coefficient value wk(q,r);(10) providing a set of neural net reference output values {A(q;
ref)} .as an approximation for the sum A(q,r) for the R instances of the SASP (r=1, . . . , R);(11) forming a difference Δ
(q)=Σ
r|A(q;
r)} A(q;
ref)|p, where p is a selected positive exponent(12) comparing the difference Δ
(q) with a first threshold value ε
(thr;
1).(13) when Δ
(q) is greater than a first positive threshold value ε
(thr;
1), adjusting at least one of the weight coefficients wk(q;
r), returning to step (9), and repeating the procedures of steps (9)-(12); and(14) when Δ
(q) is no greater than ε
(thr;
1), interpreting this condition as indicating that at least one of an optimum set of weight coefficients {wk(q;
r;
opt)} has been obtained, and using the set of optimum weight coefficients to receive and process a new SASP signal and to estimate whether the received new SASP signal corresponds to a reference word or reference phrase in the selected database. - View Dependent Claims (16, 17, 18)
-
Specification