Speech activated control system and related methods

US 7,774,202 B2
Filed: 06/12/2006
Issued: 08/10/2010
Est. Priority Date: 06/12/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A speech activated control system for controlling aerial vehicle components, comprising:

a host processor having memory and positioned in communication with a database for storing speech recognition models; and

speech actuated command program product at least partially stored in the memory of the host processor and including instructions that when executed by the host processor cause the processor to perform the operations of;

forming a digitized user-speech template representing a command annunciation,dividing the user-speech template into a plurality of time slices,subdividing each separate one of the plurality of time slices into a plurality of bins each associated with a corresponding different one of a plurality of frequency ranges,performing a noise reduction and speech enhancement on the digitized user-speech template to include;

estimating noise power for each separate set of bins having a same frequency range across the plurality of time slices to thereby provide a plurality of frequency range-specific noise power estimates,equalizing energy values of each set of bins having a same frequency range across the plurality of time slices responsive to the respective frequency range-specific noise power estimate, andthresholding each equalized bin by a predetermined value to remove noise from within and around speech formants of the user-speech template,developing a set of feature vectors representing energy of a frequency content of the digitized user-speech template to thereby determine a unique pattern identifying the command annunciation,applying a speech recognition engine to the set of feature vectors to form at least one speech recognition model associated with the command annunciation,associating an index with the at least one speech recognition model associated with the command annunciation, andstoring the at least one speech recognition model and the associated index.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech activated control system for controlling aerial vehicle components, program product, and associated methods are provided. The system can include a host processor adapted to develop speech recognition models and to provide speech command recognition. The host processor can be positioned in communication with a database for storing and retrieving speech recognition models. The system can include an avionic computer in communication with the host processor and adapted to provide command function management, a display and control processor in communication with the avionic computer adapted to provide a user interface between a user and the avionic computer, and a data interface positioned in communication with the avionic computer and the host processor provided to divorce speech command recognition functionality from vehicle or aircraft-related speech-command functionality. The system can also include speech actuated command program product at least partially stored in the memory of the host processor and adapted to provide the speech recognition model training and speech recognition model recognition functionality.

247 Citations

31 Claims

1. A speech activated control system for controlling aerial vehicle components, comprising:
- a host processor having memory and positioned in communication with a database for storing speech recognition models; and
  
  speech actuated command program product at least partially stored in the memory of the host processor and including instructions that when executed by the host processor cause the processor to perform the operations of;
  
  forming a digitized user-speech template representing a command annunciation,dividing the user-speech template into a plurality of time slices,subdividing each separate one of the plurality of time slices into a plurality of bins each associated with a corresponding different one of a plurality of frequency ranges,performing a noise reduction and speech enhancement on the digitized user-speech template to include;
  
  estimating noise power for each separate set of bins having a same frequency range across the plurality of time slices to thereby provide a plurality of frequency range-specific noise power estimates,equalizing energy values of each set of bins having a same frequency range across the plurality of time slices responsive to the respective frequency range-specific noise power estimate, andthresholding each equalized bin by a predetermined value to remove noise from within and around speech formants of the user-speech template,developing a set of feature vectors representing energy of a frequency content of the digitized user-speech template to thereby determine a unique pattern identifying the command annunciation,applying a speech recognition engine to the set of feature vectors to form at least one speech recognition model associated with the command annunciation,associating an index with the at least one speech recognition model associated with the command annunciation, andstoring the at least one speech recognition model and the associated index.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system as defined in claim 1, wherein the speech actuated command program product further includes instructions to perform the operations of:
    - receiving the command associated annunciation real-time in-flight indicating a request for recognition;
      
      returning the index associated with a stored model determined to match the command annunciation;
      
      returning a confidence score indicating likelihood the match is correct; and
      
      executing an assigned function or forming an assigned state when the confidence score is above a preselected or selected threshold value.
  - 3. The system as defined in claim 1, wherein the speech actuated command program product further includes instructions to perform the operations of:
    - recording during aircraft operation a series of switch engagements to form a speech-activated switch macro describing a command function or system state; and
      
      receiving the associated command annunciation real-time in-flight; and
      
      associating the speech-activated switch macro with a representation of the command annunciation.
  - 4. The system as defined in claim 1, wherein the speech actuated command program product further includes instructions to perform the operations of:
    - performing a dynamic range utilization analysis on the digitized user-speech template to determine if the command annunciation is below a preselected minimum threshold level; and
      
      performing a plurality of post-noise removal integrity checks, to include;
      
      performing a clipping analysis on the digitized user-speech template to determine if the command annunciation has exceeded a preselected threshold value,performing a cropping analysis on the digitized user-speech template to determine if the command annunciation is missing portions of energy,performing a misalignment analysis on the digitized user-speech template to determine if the command annunciation was successfully aligned during noise removal, andreturning an integrity score grading the quality of the command annunciation to determine if the command annunciation is acceptable for training or recognition.
  - 5. The system as defined in claim 1, wherein the speech actuated command program product further includes instructions to perform the operation of aligning the command annunciation in the digitized user-speech template to enhance analysis of the command annunciation, the alignment including the operations of:
    - determining a geometric mean of each of the plurality of bins for each of the plurality of time slices to thereby form an alignment vector;
      
      auto convolving the alignment vector to form a convolution of the alignment vector with itself;
      
      determining a mean position of peaks of the convolution to identify the center of the speech; and
      
      cyclically shifting the sampled data to center the speech in the observation window.
  - 6. The system as defined in claim 1, wherein the operation of developing a set of feature vectors representing energy of a frequency content of the digitized user-speech template includes the operations of:
    - transforming a spectrum comprising normalized post-noise-reduction speech data using Mel frequency bands to form a plurality of Mel coefficients;
      
      applying a Fourier transform to the Mel coefficients to form a Mel Spaced Cepstrum; and
      
      determining first and second derivatives of the Mel Spaced Cepstrum.
  - 7. The system as defined in claim 1,wherein the operation of estimating noise power includes estimating noise power for each of the plurality of bins for each of the plurality of time slices, located on either side of the command annunciation near and outside boundaries of the command annunciation for each of the plurality of frequency ranges, to thereby determine a background noise contour for noise within the speech template;
    - andwherein the operation of thresholding bins includes the operations of thresholding each bin by a predetermined parameterized value to remove the noise from the speech template.
  - 8. The system as defined in claim 7, wherein the operation of the thresholding further includes performing the operations of:
    - comparing each of the plurality of bins for each of the plurality of time slices to the parameterized value; and
      
      setting each bin having a value below the parameterized value to approximately zero to thereby remove the noise from the sampled data from within and around speech formants.
  - 9. The system as defined in claim 1, further comprising:
    - an avionic computer in communication with the host processor and having memory and a portion of the speech actuated command program product defining a command function manager stored in the memory; and
      
      a display and control processor in communication with the avionic computer and having memory and a portion of the speech actuated command program product defining a user interface stored in the memory.
  - 10. The system as defined in claim 9, wherein the speech actuated command program product further includes instructions to perform the operations of:
    - receiving a command function or system state;
      
      receiving the command associated annunciation real-time in-flight; and
      
      associating the command function or system state with a representation of the command annunciation.
  - 11. The system as defined in claim 9, further comprising:
    - a data interface in communication with the avionic computer and the host processor positioned to divorce speech command recognition functionality from aircraft-related speech-command functionality; and
      
      a mobile storage device interface to allow an operator to retrieve trained speech models, associated index numbers, and associated functions or vehicle system states from the database and to load previously trained speech models, associated index numbers, and associated functions or vehicle system states to the database.

12. A method to provide speech-activated control of aerial vehicle components, the method comprising the steps of:
- (a) sampling a speech signal representing speech to define sampled data;
  
  (b) performing an integrity check on the sampled data to identify when the speech is below a preselected standard;
  
  (c) aligning the sampled data in an observation window to enhance analysis of the speech;
  
  (d) performing noise reduction processing to remove noise from within and around speech formants, to include;
  
  estimating noise power for each separate set of a plurality of bins having a same frequency range across a plurality of time slices of the sampled data to thereby provide a plurality of frequency range-specific noise power estimates,equalizing energy values of each set of bins having a same frequency range across the plurality of time slices responsive to the respective frequency range-specific noise power estimate, andthresholding each equalized bin by a parameterized threshold value to remove noise from within and around speech formants of the sampled data;
  
  (e) developing a set of feature vectors representing energy of a frequency content of the sampled data to thereby determine a unique pattern; and
  
  (f) applying a speech recognition engine to the set of feature vectors to perform one of the following functions;
  
  forming at least one speech recognition model associated with the speech signal, and matching the speech signal to the at least one speech recognition model.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 13. The method as defined in claim 12,wherein step (b) includes the step of performing a dynamic range utilization analysis on the sampled data to determine if the speech is at or below a preselected minimum threshold level indicating the speech was too quiet;
    - andwherein the method further comprises the steps of;
      
      performing a clipping analysis on the sampled data to determine if the speech met or has exceeded a preselected maximum value indicating that clipping has occurred,providing an integrity score indicating quality of the speech, andrequesting repeat of the speech responsive to the integrity score when the integrity score is below a preselected or selected value to thereby enhance speech-model development when in a training mode or to thereby enhance recognition accuracy when in a recognition mode.
  - 14. The method as defined in claim 12, wherein step (c) includes the steps of:
    - dividing the sampled data into a plurality of time slices;
      
      performing a short time fourier transformation on each time slice to form fourier transformed data defining a spectrograph;
      
      converting spectrograph amplitude values of each time slice to decibels;
      
      thresholding the amplitude values by a centering threshold to normalize the energy values within each time slice;
      
      determining a geometric mean of each of a plurality of bins for each time slice to thereby form an alignment vector;
      
      auto convolving the alignment vector to form a convolution of the alignment vector with itself;
      
      determining a mean position of peaks of the convolution to identify the center of the speech; and
      
      cyclically shifting the sampled data to center the speech in the observation window.
  - 15. The method as defined in claim 12, wherein step (f) includes relaxing constraints on a minimum value, a maximum value, or both the minimum and the maximum value within a Hidden Markov model when performing speech training.
  - 16. The method as defined in claim 12,wherein step (e) includes developing a first set of feature vectors for Hidden Markov Model (HMM) modeling and a second set of feature vectors for Neural Network (NN) modeling;
    - andwherein step (f) includes applying a HMM speech recognition engine to the first set of feature vectors and a NN speech recognition engine to the second set of feature vectors; and
      
      wherein the method further comprises executing a voting scheme between a plurality of speech recognition engines including at least one of the following;
      
      the HMM speech recognition engine and the NN speech recognition engine.
  - 17. The method as defined in claim 12, further comprising the step of iteratively forming a plurality of speech recognition models for each of a separate plurality of operational profiles having differing environmental characteristics including a substantially different background noise contour.
  - 18. The method as defined in claim 12, wherein the step of developing a set of feature vectors includes developing one or more sets of feature vectors, and wherein the method further comprises:
    - applying each of a plurality of speech recognition engines separately to at least one of the one or more sets of feature vectors; and
      
      executing a voting scheme between the plurality of speech recognition engines.
  - 19. The method as defined in claim 12, wherein the step of developing a set of feature vectors representing energy of a frequency content of the sampled data includes the operations of:
    - transforming a spectrum comprising normalized post-noise-reduction speech data using Mel frequency bands to form a plurality of Mel coefficients;
      
      applying a Fourier transform to the Mel coefficients to form a Mel Spaced Cepstrum; and
      
      determining first and second derivatives of the Mel Spaced Cepstrum.
  - 20. The method as defined in claim 12, wherein the step of estimating noise power includes the step of estimating noise power for each of a plurality of time slices on either side of the speech near and outside boundaries of the speech for each of a plurality of frequency ranges to thereby determine a background noise contour for noise within the observation window.
  - 21. The method as defined in claim 20, whereinthe parameterized threshold value is a predetermined parameterized value to remove the noise from the speech;
    - andwherein the step of thresholding includes setting each equalized bin having a value at or below the predetermined parameterized threshold value to a nominal value.
  - 22. The method as defined in claim 21, wherein the step of the thresholding includes the steps of:
    - comparing each of the plurality of bins for each of the plurality of time slices to the parameterized value; and
      
      setting each bin having a value below the parameterized value to approximately zero to thereby remove from the sampled data the noise within and around speech formants.
  - 23. The method as defined in claim 21, further comprising the step of:
    - performing a cropping analysis on the sampled data to determine if the speech is potentially missing portions of energy.
  - 24. The method as defined in claim 21, further comprising the step of:
    - performing a misalignment analysis on the sampled data to determine if the speech was successfully aligned during noise removal.
  - 25. The method as defined in claim 21,wherein when performing speech training the method further includes the steps of assigning an index to the at least one speech recognition model associated with the speech and storing the at least one speech recognition model and the assigned index;
    - andwherein when performing speech recognition the method further includes the steps of;
      
      returning the index associated with a stored speech recognition model determined to match sampled data received during speech recognition,displaying a confidence score indicating likelihood the match is correct, andexecuting an assigned function or forming an assigned state when the confidence score is above a preselected or selected threshold value.
  - 26. The method as defined in claim 25,wherein the speech recognition models are stored in one of a plurality of vocabulary templates;
    - wherein each vocabulary template is associated with a different one of a plurality of predetermined operational profiles; and
      
      wherein the parameterized value is preselected responsive to analysis of noise conditions of a selected one of the plurality of predetermined operational profiles.

27. A method to provide speech-activated control of aerial vehicle components, the method comprising the steps of:
- (a) performing noise reduction processing on sampled speech data representing a command annunciation to remove noise from within and around speech formants of the sampled speech, to include;
  
  estimating noise power for each separate set of a plurality of bins having a same frequency range across a plurality of time slices of the sampled speech data to thereby provide a plurality of frequency range-specific noise power estimates,equalizing energy values of each set of bins having a same frequency range across the plurality of time slices responsive to the respective frequency range-specific noise power estimate, andthresholding each equalized bin by a parameterized threshold value to remove noise from within and around speech formants of the sampled speech data;
  
  (b) developing a set of feature vectors representing energy of a frequency content of the sampled speech data to thereby determine a unique pattern identifying the command annunciation;
  
  (c) applying a speech recognition engine to the set of feature vectors to thereby form at least one speech recognition model;
  
  (d) associating an index with the at least one speech recognition model associated with the command annunciation; and
  
  (e) storing the at least one speech recognition model and the assigned index.
- View Dependent Claims (28, 29, 30, 31)
- - 28. The method as defined in claim 27, further comprising the steps ofreceiving a command function or system state;
    - receiving the command associated annunciation real-time in-flight; and
      
      associating the command function or system state with a representation of the command annunciation.
  - 29. The method as defined in claim 27, further comprising the steps of:
    - recording during aircraft operation a series of switch engagements to form a speech-activated switch macro describing a command function or system state;
      
      receiving the associated command annunciation real-time in-flight; and
      
      associating the speech-activated switch macro with a representation of the command annunciation.
  - 30. The method as defined in claim 27, wherein the sampled speech data is a first sampled data processed to form a stored speech model during a speech recognition model training event, wherein the command annunciation is a first command annunciation, and wherein the method further comprises the steps of:
    - responsive to a second command annunciation received during a recognition event, returning the index associated with the stored speech model determined to match a second sampled data;
      
      returning a confidence score indicating likelihood the match is correct; and
      
      executing an assigned function or forming an assigned state when the confidence score is above a preselected or selected threshold value.
  - 31. The method as defined in claim 27, wherein the sampled speech data is a first sampled data processed to form a stored speech model during a speech recognition model training event, wherein the command annunciation is a first command annunciation, and wherein the method further comprises the steps of:
    - responsive to a second command annunciation received during a recognition event, performing a dynamic range utilization analysis on a second sampled data to determine if the speech is below a preselected minimum threshold level indicating the speech was too quiet;
      
      performing a clipping analysis on the second sampled data to determine if the speech has exceeded a preselected maximum value indicating that clipping has occurred; and
      
      performing a cropping analysis on the second sampled data to determine if the speech is potentially missing portions of energy.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lockheed Martin Corporation (Martin Marietta Corporation)
Original Assignee
Lockheed Martin Corporation (Martin Marietta Corporation)
Inventors
Russo, Jon C., Armbruster, Kermit L., Spengler, Richard P., Barnett, Gregory W.
Primary Examiner(s)
Opsasnick; Michael N

Application Number

US11/451,217
Publication Number

US 20070288242A1
Time in Patent Office

1,520 Days
Field of Search

704/236, 704/241
US Class Current

704/236
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/20   Speech recognition techniqu...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 21/0208   Noise filtering

Speech activated control system and related methods

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

247 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Speech activated control system and related methods

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

247 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links