System and device for advanced voice recognition word spotting

US 6,006,185 A
Filed: 05/09/1997
Issued: 12/21/1999
Est. Priority Date: 05/09/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system that pre-processes speech data and inputs the pre-processed speech data in several forms to a voice recognition engine for determination of the most likely spoken word, comprising:

audio input means to input audio data comprised of phonemes;

phoneme identification means to detect individual phonemes;

wave segment grouping means to group phonemes into wave segments;

a wave segment pre-processor to select groups of wave segments having at least one wave segment and output the selected wave segment groups to a speech recognition engine; and

a speech recognition engine having means to compare wave segments groups output by the wave segment pre-processor with a predetermined list of words;

means to determine which wave segment groups match entries in the predetermined list of words;

whereby wave segments are pre-processed into wave segment groups and analyzed to determine which of the wave segment groups most likely represent words that are input to the speech recognition engine.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speaker independent, continuous speech, word spotting voice recognition system and method. The edges of phonemes in an utterance are quickly and accurately isolated. The utterance is broken into wave segments based upon the edges of the phonemes. A voice recognition engine is consulted multiple times for several wave segments and the results are analyzed to correctly identify the words in the utterance.

Citations

20 Claims

1. A speech recognition system that pre-processes speech data and inputs the pre-processed speech data in several forms to a voice recognition engine for determination of the most likely spoken word, comprising:
- audio input means to input audio data comprised of phonemes;
  
  phoneme identification means to detect individual phonemes;
  
  wave segment grouping means to group phonemes into wave segments;
  
  a wave segment pre-processor to select groups of wave segments having at least one wave segment and output the selected wave segment groups to a speech recognition engine; and
  
  a speech recognition engine having means to compare wave segments groups output by the wave segment pre-processor with a predetermined list of words;
  
  means to determine which wave segment groups match entries in the predetermined list of words;
  
  whereby wave segments are pre-processed into wave segment groups and analyzed to determine which of the wave segment groups most likely represent words that are input to the speech recognition engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A system, as in claim 1, wherein the phoneme identification means further comprises:
    - means to select a portion of the input audio data with a first preselected sampling window length, the first preselected sampling window length having a length less than a phoneme;
      
      means to select a portion of the input audio data with a second preselected sampling window length, the second preselected sampling window length having a length greater than the first sampling window;
      
      means to select a portion of the input audio data with a third preselected sampling window length, the third preselected sampling window having a length greater than the second sampling window length;
      
      means to analyze the audio data using each sampling window length to determine which sampling window length most closely approximates a complete phoneme cycle and selecting the sampling window length that most closely approximates a complete phoneme cycle; and
      
      means to analyze the audio data using the selected sampling window length to select individual phonemes by advancing through the input audio data from a first detected phoneme to a subsequent phoneme by advancing a window length;
      
      whereby the audio data can be more rapidly analyzed by stepping through the input data using the sampling window length.
  - 3. A system, as in claim 2, further comprising means to select the sampling window length that most closely approximates the phoneme length by detecting a change in pitch, the change in pitch determined by selecting the sampling window length that results in the lowest difference value when two adjacent windows are subtracted from one another.
  - 4. A system, as in claim 2, wherein:
    - the first preselected sampling window is less than ten milliseconds; and
      
      at least one of the second and third preselected sampling windows is greater than ten milliseconds.
  - 5. A system, as in claim 4, further comprising means to select the sampling window length that most closely approximates the phoneme length by selecting the sampling window length that results in the lowest difference value when two adjacent windows are subtracted from one another.
  - 6. A system, as in claim 1, wherein the wave segment pre-processor further comprises:
    - a table of valid wave segments;
      
      means to compare the wave segments with the table of valid wave segments; and
      
      means to output to the speech recognition engine wave segments which correspond to wave segments that are stored in the table of valid wave segments.
  - 7. A system, as in claim 6, wherein the phoneme identification means further comprises:
    - means to select a portion of the input audio data with a first preselected sampling window, the first preselected sampling window having a length less than a phoneme;
      
      means to select a portion of the input audio data with a second preselected sampling window, the second preselected sampling window having a length greater than the first sampling window;
      
      means to select a portion of the input audio data with a third preselected sampling window, the third preselected sampling window having a length greater than the second sampling window;
      
      means to examine the audio data in each sampling window to determine which sampling window most closely approximates a complete phoneme cycle and selecting the sampling window length of the sampling window that most closely approximates a complete phoneme cycle; and
      
      means to examine the audio data using the selected sampling window length to select individual phonemes by advancing through the input audio data from a first detected phoneme to a subsequent phoneme by advancing a window length;
      
      whereby the audio data can be more rapidly analyzed by stepping through the input data using the sampling window length.
  - 8. A system, as in claim 7, further comprising means to select the sampling window length that most closely approximates the phoneme length by detecting a change in pitch, the change in pitch determined by selecting the sampling window length that results in the lowest difference value when two adjacent windows are subtracted from one another.
  - 9. A system, as in claim 7, wherein:
    - the first preselected sampling window is less than ten milliseconds; and
      
      at least one of the second and third preselected sampling windows is greater than ten milliseconds.
  - 10. A system, as in claim 9, further comprising means to select the sampling window length that most closely approximates the phoneme length by selecting the sampling window length that results in the lowest difference value when two adjacent windows are subtracted from one another.

11. A method of recognizing continuous speech voice input with a computer, including the steps of:
- inputting audio data comprised of phonemes;
  
  identifying individual phonemes;
  
  grouping phonemes into wave segments;
  
  selecting groups of wave segments that each contain at least one wave segment with a wave segment pre-processor by first comparing the wave segments with a table of preselected valid words and then only outputting those wave segment groups which correspond to wave segments that are stored in a table of valid words to a speech recognition engine;
  
  whereby wave segments are pre-processed into wave segment groups and analyzed to determine which of the wave segment groups most likely represent words that are input to the speech recognition engine.
- View Dependent Claims (12, 13, 14, 15)
- - 12. A method, as in claim 11, including the further steps of:
    - selecting a portion of the input audio data with a first preselected sampling window, the first preselected sampling window having a length less than a phoneme;
      
      selecting a portion of the input audio data with a second preselected sampling window, the second preselected sampling window having a length greater than the first sampling window;
      
      selecting a portion of the input audio data with a third preselected sampling window, the third preselected sampling window having a length greater than the second sampling window;
      
      examining the audio data in each sampling window to determine which sampling window most closely approximates a complete phoneme cycle and selecting the sampling window length of the sampling window that most closely approximates a complete phoneme cycle; and
      
      advancing through the audio data using the selected sampling window length to select the subsequent phonemes.
  - 13. A method, as in claim 12, including the further steps of selecting the sampling window length that most closely approximates the phoneme length by detecting a change in pitch, the change in pitch determined by selecting the sampling window length that results in the lowest difference value when two adjacent windows are subtracted from one another.
  - 14. A method, as in claim 12, including the further steps of:
    - setting the first preselected sampling window to a value less than ten milliseconds; and
      
      setting at least one of the second and third preselected sampling windows is greater than ten milliseconds.
  - 15. A method, as in claim 14, including the further steps of selecting the sampling window length that most closely approximates the phoneme length by selecting the sampling window length that results in the lowest difference value when two adjacent windows are subtracted from one another.

16. A method of word spotting in a speech recognition system, including the steps of:
- identifying groups of phonemes which represent possible valid words;
  
  comparing the possible valid words with a preselected list of valid words;
  
  selecting words which correspond to an entry in the list of valid words; and
  
  outputting the selected words to a speech recognition engine;
  
  whereby groups of phonemes are pre-processed such that only selected groups of phonemes are input to the voice recognition engine.
- View Dependent Claims (17, 18, 19)
- - 17. A method, as in claim 16, including the further steps of:
    - identifying groups of phonemes which represent possible valid words by;
      
      identifying individual phonemes; and
      
      grouping phonemes into wave segments;
      
      selecting wave segments with a wave segment pre-processor;
      
      comparing the wave segments with a table of preselected valid wave segments;
      
      outputting wave segments which correspond to wave segments that are stored in the table of valid wave segments as selected words; and
      
      whereby only wave segments that match pre-selected wave segments are input to the speech recognition engine.
  - 18. A method, as in claim 17, including the further steps of:
    - identifying phonemes by selecting a portion of the input audio data with a first preselected sampling window, the first preselected sampling window having a length less than a phoneme;
      
      selecting a portion of the input audio data with a second preselected sampling window, the second preselected sampling window having a length greater than the first sampling window;
      
      selecting a portion of the input audio data with a third preselected sampling window, the third preselected sampling window having a length greater than the second sampling window;
      
      examining the audio data in each sampling window to determine which sampling window most closely approximates a complete phoneme cycle and selecting the sampling window length of the sampling window that most closely approximates a complete phoneme cycle; and
      
      advancing through the audio data using the selected sampling window length to select subsequent phonemes.
  - 19. A method, as in claim 18, including the additional steps of:
    - examining each sampling window to determine the signal to noise ratio for that sampling window;
      
      calculating the difference in signal to noise ratio between adjacent sampling windows; and
      
      selecting the sampling window that most closely approximates the phoneme length by selecting the sampling window length that results in the lowest signal to noise ratio.

20. A method of using rules based responses in a speech recognition system, including the steps of:
- inputting audio data from a user into a computer;
  
  identifying a keyword in the data input by the user;
  
  comparing the keyword with a list of valid words in a database;
  
  selecting the keyword when the keyword corresponds to only a valid word in the database;
  
  entering the selected keyword to a speech recognition engine only when the keyword matches an entry in the list of valid words;
  
  querying the user with a list of choices when the keyword corresponds to more than one valid word in the database;
  
  querying the user to reenter the audio data when the keyword does not correspond to a valid word in the database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Peter Immarco
Original Assignee
Peter Immarco
Inventors
Immarco, Peter
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
SAX, ROBERT L

Application Number

US08/853,959
Time in Patent Office

956 Days
Field of Search

704/251, 704/254, 704/252, 704/243, 704/255
US Class Current

704/251
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

System and device for advanced voice recognition word spotting

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and device for advanced voice recognition word spotting

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links