Sub-audible speech recognition based upon electromyographic signals

US 8,200,486 B1
Filed: 06/05/2003
Issued: 06/12/2012
Est. Priority Date: 06/05/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A method for training and using a system to identify a sub-audible signal formed by a source of sub-audible sounds, the method comprising providing a computer that is programmed to execute, and does execute, the following actions:

;

(1) receiving R signal sequences, numbered r=1, . . . , R (R≧

2), with each sequence comprising an instance of a sub-audible speech pattern (“

SASP”

), uttered by a user, and each SASP including at least one word drawn from a selected database of Q words, numbered q=1, . . . , Q with Q≧

2;

(2) estimating where each of the R SASPs begins and ends in the sequences;

for each of the signal sequences, numbered r=1, . . . , R;

(3) providing signal values of a received signal, number r, within a temporal window having a selected window width Δ

t(win); and

(4) transforming each of the R SASPs, using a Signal Processing Transform (“

SPT”

) operation to obtain an SPT value that is expressed in terms of at least first and second transform parameters comprising at least a signal frequency and a signal energy associated with the SASP;

(5) providing a first matrix M with first matrix entries equal to the SPT values for the R SASPs, ordered according to the at least first and second transform parameters along a first matrix axis and along a second matrix axis, respectively, of the matrix M;

(6) tessellating the matrix M into a sequence of exhaustive and mutually exclusive cells of matrix entries, referred to as M-cells, with each M-cell containing a collection of contiguous matrix entries, where each M-cell is characterized according to at least one selected M-cell criterion;

(7) providing, for each M-cell, an M-cell representative value, depending upon at least one of the first matrix entries within the M-cell;

(8) formatting the M-cell representative values as a vector V with vector entry values v_k(q;

r), numbered k=1,. . . , K (K≧

2);

(9) analyzing the vector entry values v_k(q;

r) using a neural net classifier, having a neural net architecture, and a sequence of estimated weight coefficient values associated with at least one of the neural net classifier layers, where the neural net classifier provides a sequence of output values dependent upon the weight coefficient values and upon the vector entry values v_k(q;

r);

(10) receiving the vector entries v_k(q;

r) and forming a first sum
S1(q;

r)_h=Σ

_kW_1,k,h(q;

r)·

v_k(q;

r),where {w_1,k,h(q;

r)}·

is a first selected set of adjustable weight coefficients that are estimated by a neural net procedure;

(11) forming a first activation function A1{S1(q;

r)_h}, that is monotonically increasing as the value S1(q;

r)_hincreases;

(12) forming a second sum
S2(q;

r)_g=Σ

_hw_2,h,g(q;

r)·

A1{ S1(q;

r)_h} (g =1, . . . , G;

G≧

1),where w_2,h,g(q;

r)·

is a second selected set of adjustable weight coefficients that are estimated by the neural net procedure;

(13) forming a second activation function A2 {S2(q;

r)_g} that depends upon the second sum S2(q;

r), that is monotonically increasing as the value S2(q;

r) increases;

(14) providing a set of reference output values {A(q;

ref)_g} as an approximation for the sum A2 {S2(q,r)_g} for the R instances of the SASP;

(15) forming a difference Δ

1(q)=(1/R·

G) Σ

_r,g|A2{S2(q;

r)_g}−

A](q;

ref)_g|^p1, where p1 is a selected positive exponent;

(16) comparing the difference Δ

1(q) with a selected threshold value ε

(thr;

1);

(17) when Δ

1(q)[[>

]] is greater than ε

(thr;

1), adjusting at least one of the weight coefficients w_1,k,h(q;

r) and the weight coefficients w_2,h,g(q;

r), returning to step (10), and repeating the procedures of steps (10)-(16); and

(18) when Δ

1(q) is no greater than ε

(thr;

1), interpreting this condition as indicating that at least one of an optimum first set of weight coefficients {w_1,k,h(q;

r;

opt)} and an optimum second set of weight coefficients {w_2,h,g(q;

r;

opt)} has been obtained, and using the at least one of the first set and second set of optimum weight coefficients to receive and process a new SASP signal and to estimate whether the received new SASP signal corresponds to a reference word or reference phrase in the selected database.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns (“SASPs”) for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms (“SPTs”) are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

Citations

18 Claims

1. A method for training and using a system to identify a sub-audible signal formed by a source of sub-audible sounds, the method comprising providing a computer that is programmed to execute, and does execute, the following actions:
- ;
  
  (1) receiving R signal sequences, numbered r=1, . . . , R (R≧
  
  2), with each sequence comprising an instance of a sub-audible speech pattern (“
  
  SASP”
  
  ), uttered by a user, and each SASP including at least one word drawn from a selected database of Q words, numbered q=1, . . . , Q with Q≧
  
  2;
  
  (2) estimating where each of the R SASPs begins and ends in the sequences;
  
  for each of the signal sequences, numbered r=1, . . . , R;
  
  (3) providing signal values of a received signal, number r, within a temporal window having a selected window width Δ
  
  t(win); and
  
  (4) transforming each of the R SASPs, using a Signal Processing Transform (“
  
  SPT”
  
  ) operation to obtain an SPT value that is expressed in terms of at least first and second transform parameters comprising at least a signal frequency and a signal energy associated with the SASP;
  
  (5) providing a first matrix M with first matrix entries equal to the SPT values for the R SASPs, ordered according to the at least first and second transform parameters along a first matrix axis and along a second matrix axis, respectively, of the matrix M;
  
  (6) tessellating the matrix M into a sequence of exhaustive and mutually exclusive cells of matrix entries, referred to as M-cells, with each M-cell containing a collection of contiguous matrix entries, where each M-cell is characterized according to at least one selected M-cell criterion;
  
  (7) providing, for each M-cell, an M-cell representative value, depending upon at least one of the first matrix entries within the M-cell;
  
  (8) formatting the M-cell representative values as a vector V with vector entry values v_k(q;
  
  r), numbered k=1,. . . , K (K≧
  
  2);
  
  (9) analyzing the vector entry values v_k(q;
  
  r) using a neural net classifier, having a neural net architecture, and a sequence of estimated weight coefficient values associated with at least one of the neural net classifier layers, where the neural net classifier provides a sequence of output values dependent upon the weight coefficient values and upon the vector entry values v_k(q;
  
  r);
  
  (10) receiving the vector entries v_k(q;
  
  r) and forming a first sum
  S1(q;
  
  r)_h=Σ
  
  _kW_1,k,h(q;
  
  r)·
  
  v_k(q;
  
  r),where {w_1,k,h(q;
  
  r)}·
  
  is a first selected set of adjustable weight coefficients that are estimated by a neural net procedure;
  
  (11) forming a first activation function A1{S1(q;
  
  r)_h}, that is monotonically increasing as the value S1(q;
  
  r)_hincreases;
  
  (12) forming a second sum
  S2(q;
  
  r)_g=Σ
  
  _hw_2,h,g(q;
  
  r)·
  
  A1{ S1(q;
  
  r)_h} (g =1, . . . , G;
  
  G≧
  
  1),where w_2,h,g(q;
  
  r)·
  
  is a second selected set of adjustable weight coefficients that are estimated by the neural net procedure;
  
  (13) forming a second activation function A2 {S2(q;
  
  r)_g} that depends upon the second sum S2(q;
  
  r), that is monotonically increasing as the value S2(q;
  
  r) increases;
  
  (14) providing a set of reference output values {A(q;
  
  ref)_g} as an approximation for the sum A2 {S2(q,r)_g} for the R instances of the SASP;
  
  (15) forming a difference Δ
  
  1(q)=(1/R·
  
  G) Σ
  
  _r,g|A2{S2(q;
  
  r)_g}−
  
  A](q;
  
  ref)_g|^p1, where p1 is a selected positive exponent;
  
  (16) comparing the difference Δ
  
  1(q) with a selected threshold value ε
  
  (thr;
  
  1);
  
  (17) when Δ
  
  1(q)[[>
  
  ]] is greater than ε
  
  (thr;
  
  1), adjusting at least one of the weight coefficients w_1,k,h(q;
  
  r) and the weight coefficients w_2,h,g(q;
  
  r), returning to step (10), and repeating the procedures of steps (10)-(16); and
  
  (18) when Δ
  
  1(q) is no greater than ε
  
  (thr;
  
  1), interpreting this condition as indicating that at least one of an optimum first set of weight coefficients {w_1,k,h(q;
  
  r;
  
  opt)} and an optimum second set of weight coefficients {w_2,h,g(q;
  
  r;
  
  opt)} has been obtained, and using the at least one of the first set and second set of optimum weight coefficients to receive and process a new SASP signal and to estimate whether the received new SASP signal corresponds to a reference word or reference phrase in the selected database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein said computer is further programmed to execute, and does execute, said step (18) by a procedure comprising the following actions:
    - (19) receiving a new sub-audible speech pattern SASP signal uttered by said user containing an instance of at least one unknown word, referred to as a “
      
      new”
      
      word, indexed with an index q′
      
      that may be in said database of Q;
      
      (20) estimating where the new word begins and ends in the new SASP(21) providing signal values for the new SASP within each of said temporal windows, numbered j=1, . . . , J with J≧
      
      2, that are shifted in time relative to each other by selected multiples of a selected displacement time Δ
      
      t(displ);
      
      (22) for the signal values within each of the time-shifted windows, numbered j=1, . . . , J;
      
      (23) transforming each of the signal values of the new SASP, using said Signal Processing Transform (SPT) operation to obtain new SASP SPT values with said at least first and second transform SPT values;
      
      (24) providing a second matrix M′
      
      with second matrix entries equal to the new SASP SPT values, ordered according to said at least first and second transform parameters along a first and second matrix axes, respectively, of the second matrix M′
      
      ;
      
      (25) tessellating the second matrix M′
      
      into a sequence of exhaustive and mutually exclusive M′
      
      -cells that correspond to said M-cells for said tessellated matrix M, where each M′
      
      -cell is characterized according to at least one selected M′
      
      -cell criterion;
      
      (26) providing, for each M′
      
      -cell in the second matrix M′
      
      , a M′
      
      -cell representative value depending upon at least one of the second matrix entries within the M′
      
      -cell;
      
      (27) formatting the M′
      
      -cell representative values as a vector V′
      
      with vector entry values where v′
      
      _k(q′
      
      ;
      
      r) refers to new word or phrase index (k=1, . . . , K);
      
      (28) applying said neural net classifier and said reference set of said optimum first set and said optimum second set of weight coefficients to compute said neural net classifier output values for each of the time-shifted sequences of the new SASP;
      
      (29) receiving the vector entries v′
      
      _k(q;
      
      r) and forming a first sum
      S1′
      
      (q′
      
      ;
      
      q″
      
      ;
      
      r)_h=Σ
      
      _kw′
      
      _1,k,h(q″
      
      ;
      
      r;
      
      opt)·
      
      v′
      
      _k(q′
      
      ;
      
      r),
3. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- replacing at least one of said matrix cell features by a normalized feature for each of said cells corresponding to said matrix M.
4. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- when at least two distinct words, number q1 and q2, in said database satisfy Δ
  
  1′
  
  (q′
  
  ;
  
  q″
  
  =q1)≈
  
  Δ
  
  1′
  
  (q′
  
  ;
  
  q″
  
  =q2), and Δ
  
  1′
  
  (q′
  
  ;
  
  q1) and Δ
  
  1′
  
  (q′
  
  ;
  
  q2) are substantially less than Δ
  
  1′
  
  (q′
  
  q″
  
  ) for any word q″
  
  ≠
  
  q1 and q″
  
  ≠
  
  q2 in said database, and interpreting this condition as indicating that said new word included in said new SASP cannot be unambiguously identified.
5. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- choosing said weighting for said weighted points from the group of weighting consisting of (i) substantially uniform weighting and (ii) a weighting that decreases monotonically as said magnitude of said comparison difference increases.
6. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- determining said reference set of said weight coefficients to be independent of said word number q in said database.
7. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- determining said reference set of said weight coefficients so that at least one reference setnof said weight coefficients so that at least one reference set weight coefficient for a first selected word number q1 in said database differs from a corresponding reference set weight coefficient for a second selected word number q2 in said database.
8. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- selecting said window width Δ
  
  t(win) in a range 1-4 sec.
9. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- selecting each of said matrix cells to be rectangularity shaped.
10. The method of claim 9, wherein said computer is further programmed to execute, and does execute, the following actions:
- selecting at least two of said matrix cells to have different sizes.
11. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- choosing said SPT operations from the group of SPT operations consisting of (i) a windowed short time interval Fourier Transform (STFT);
  
  (ii) discrete wavelets (DWTs) and continuous wavelets (CWTs) using Daubechies 5 and 7 bases;
  
  (iii) dual tree wavelets (DTWTs) with a near sym_a 5,7 tap filter and a Q-shift 14,14 tap filter;
  
  (iv) Hartley Transform;
  
  (v) Linear Predictive Coding (LPC) coefficients;
  
  (vi) a moving average of a selected number of said sample values with uniform weighting; and
  
  (vii) a moving average of a selected number of said sample values with non-uniform weighting.
12. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- selecting said database to include at least one of the words “
  
  stop”
  
  , “
  
  go”
  
  , “
  
  left”
  
  , “
  
  right”
  
  , “
  
  alpha”
  
  , “
  
  omega”
  
  , “
  
  one”
  
  , “
  
  two”
  
  , “
  
  three”
  
  , “
  
  four”
  
  , “
  
  five”
  
  , “
  
  six”
  
  , “
  
  seven”
  
  , “
  
  eight”
  
  , “
  
  nine” and
  
  “
  
  ten”
  
  .
13. The method of claim 2, wherein said computer is further programmed to execute, and does execrute, the following actions:
- selecting said error threshold number to lie in a range e(thr;
  
  1)≦
  
  0.01.
14. The method of claim 2, wherein said computer is further programmed to execute, and does execute, the following actions:
- applying a backpropagation of error method in said neural net classifier analysis of said features of said cells of said matrix M.

15. A method for training and using a system to identify a sub-audible signal formed by a source of sub-audible sounds, the method comprising providing a computer that is programmed to execute, and does execute, the following actions:
- (1) receiving R signal sequences, numbered r=1, . . . , , R(R≧
  
  2), with each sequence comprising an instance of a specified sub-audible speech pattern (“
  
  SASP”
  
  ), uttered by the user, and each SASP including at least one word drawn from a selected database of Q words, numbered q=1, . . . , Q (Q≧
  
  2);
  
  (2) estimating where each SASP begins and ends for each of the signal sequences;
  
  (3) providing signal values of the received signal, number r, within a temporal window having a selected window width Δ
  
  t(win);
  
  (4) transforming each of the R SASPs, using an Signal Processing Transform (“
  
  SPT”
  
  ) operation to obtain an SPT value that is expressed in terms of at least one transform parameter having a sequence of parameter values, including a signal frequency an a signal energy associated with the SASP;
  
  (5) providing a first matrix M with first matrix entries equal to the SPT values for the R SASPs, ordered according to each of the at least first and second transform parameters along a first matrix axis and along a second matrix axis, respectively of the matrix M;
  
  (6) tessellating the matrix M into a sequence of exhaustive and mutually exclusive, cells of the matrix entries, referred to as M-cells, with each M-cell containing a collection of contiguous matrix entries, where each M-cell is characterized according to at least one selected M-cell criterion;
  
  (7) providing, for each M-cell, an M-cell representative value depending upon at least one of the first matrix entries within the M-cell;
  
  (8) formatting the cell representative values as a vector V with vector entry values v_k(q;
  
  r) numbered k=1, . . . , K (K≧
  
  2);
  
  (9) analyzing the vector entry values v_k(q;
  
  r) using a neural net classifier, having a neural net architecture with at least one neural net hidden layer, and a sequence of estimated weight coefficient values w_k(q,r) associated with that at least one neural net hidden layer, where the neural net classifier provides a sequence of neural net output values A(q,r), equal to a sum over the index k of each of the vector entry values v_k(q,r) multiplied by a corresponding weight coefficient value w_k(q,r);
  
  (10) providing a set of neural net reference output values {A(q;
  
  ref)} .as an approximation for the sum A(q,r) for the R instances of the SASP (r=1, . . . , R);
  
  (11) forming a difference Δ
  
  (q)=Σ
  
  _r|A(q;
  
  r)} A(q;
  
  ref)|^p, where p is a selected positive exponent(12) comparing the difference Δ
  
  (q) with a first threshold value ε
  
  (thr;
  
  1).(13) when Δ
  
  (q) is greater than a first positive threshold value ε
  
  (thr;
  
  1), adjusting at least one of the weight coefficients w_k(q;
  
  r), returning to step (9), and repeating the procedures of steps (9)-(12); and
  
  (14) when Δ
  
  (q) is no greater than ε
  
  (thr;
  
  1), interpreting this condition as indicating that at least one of an optimum set of weight coefficients {w_k(q;
  
  r;
  
  opt)} has been obtained, and using the set of optimum weight coefficients to receive and process a new SASP signal and to estimate whether the received new SASP signal corresponds to a reference word or reference phrase in the selected database.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15 wherein said computer is further programmed to execute, and does execute, the following actions:
    - choosing said SPT operations from a group of SPT operations consisting of (i) a windowed short time interval Fourier Transform (STFT);
      
      (ii) discrete wavelets (DWTs) and continuous wavelets (CWTs) using Daubechies 5 and 7 bases;
      
      (iii) dual tree wavelets (DTWTs) with a near sym_a 5,7 tap filter and a Q-shift 14,14 tap filter;
      
      (iv) Hartley Transform;
      
      (v) Linear Predictive Coding (LPC) coefficients;
      
      (vi) a moving average of a selected number of said sample values with uniform weighting; and
      
      (vii) a moving average of a selected number of said sample values with non-uniform weighting.
  - 17. The method of claim 15, wherein said computer is further programmed to execute, and does execute, the following actions:
    - selecting at least first and second of said matrix cells to have a cell dimension, measured along a corresponding matrix axis of said matrix M, that is different for the first cell and for the second cell.
  - 18. The method of claim 15, wherein said computer is further programmed to execute, and does execute, the following actions:
    - (15) receiving a new sub-audible speech pattern SASP1 uttered by said user, comprising an instance of at least one unknown word, referred to as a “
      
      new”
      
      word, identified with an index q1, that may be but is not necessarily drawn from said database of Q words;
      
      (16) estimating where the new word begins and ends in the new SASP1;
      
      (17) providing signal values of the received SASP1 within each of said temporal windows;
      
      (18) transforming each of the signal values of the new SASP1, using said Signal Processing Transform (SPT) operation to obtain new SASP1 SPT values, where each SASP1 SPT value is expressed in terms of said at least first and second transform parameters, including a signal frequency and a signal energy associated with the SASP1;
      
      (19) providing a second matrix M1 with second matrix entries equal to SPT values for the SASP1, ordered according to each of said at least first and second transform parameters along first and second matrix axes of the second matrix M1;
      
      (20) tessellating the matrix M1 into a sequence of exhaustive and mutually exclusive M1-cells that correspond to said sequence of said M-cells for said matrix M where each M1-cell is characterized according to said or more cell criteria for said M-cells;
      
      (21) providing, for each M1-cell, an M1-cell representative value depending upon at least one of the second matrix entry values within the M1-cell;
      
      (22) formatting the M1-cell representative values as a vector V1 with vector entries v1_k(q1), numbered k=1, . . . , K (K≧
      
      2), where q1 refers to said index associated with said new word;
      
      (23) analyzing the vector entry values v1_k(q1) using said neural net classifier, having said neural net architecture with said at least one neural net hidden layer, and a sequence w_k(q1,opt) of said optimum weight coefficients w_k(q1,r1;
      
      opt), associated with said at least one neural net hidden layer, and averaged over said R instances (r1=1, . . . , R) of said SASP uttered by said user in claim 15,(24) providing a neural net output value A1(q1), equal to a sum over the index k of each of the vector entry values v1_k(q1) multiplied by the corresponding averaged optimum weight coefficient value w_k(q1,opt);
      
      (25) providing a set of neural net reference output values {A1(q′
      
      ;
      
      ref)} as an approximation for the sum A1(q1) for the R1 instances of the SASP1, where q′
      
      is one of said indices corresponding to said database of Q words;
      
      (26) forming a comparison difference Δ
      
      1(q1,q′
      
      )=|A1(q1)} —
      
      A(q′
      
      ;
      
      ref) |^p, where said quantities A1(q′
      
      ;
      
      ref) and p are determined as in claim 15;
      
      (27) comparing the difference Δ
      
      1(q1,q′
      
      ) with said first threshold value ε
      
      (thr;
      
      1).(28) when Δ
      
      1(q1,q′
      
      ) is greater than said first threshold value ε
      
      (thr;
      
      1), interpreting this condition as indicating that said sub-audible speech pattern SASP1 received is not a sub-audible speech pattern from said database with the corresponding number q1=q′
      
      ; and
      
      (29) when Δ
      
      1(q1,q′
      
      ) is no greater than ε
      
      (thr;
      
      1), interpreting this condition as indicating that said sub-audible speech pattern SASP1 received is likely to be a sub-audible speech pattern from said database, indexed by q′
      
      , with the corresponding index q1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States of America As Represented By The Secretary of Agriculture, U.S.A. as represented by the Administrator of the National Aeronautics and Space Administration
Original Assignee
The United States of America As Represented By The Secretary of Agriculture
Inventors
Jorgensen, Charles C., Lee, Diana D., Agabon, Shane T.
Primary Examiner(s)
YEN, ERIC L

Application Number

US10/457,696
Time in Patent Office

3,295 Days
Field of Search

704231-233, 704/236, 704/246
US Class Current

704/233
CPC Class Codes

G10L 15/16 using artificial neural net...

G10L 15/24 Speech recognition using no...

Sub-audible speech recognition based upon electromyographic signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Sub-audible speech recognition based upon electromyographic signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links