Speech recognition method and system using compressed speech data

US 6,003,004 A
Filed: 01/08/1998
Issued: 12/14/1999
Est. Priority Date: 01/08/1998
Status: Expired due to Term

First Claim

Patent Images

1. A method for recognizing a spoken word, the method comprising the steps of:

receiving compressed speech data that has been compressed using linear prediction coding (LPC) techniques;

extracting at least one set of LPC parameters from said compressed speech data without completely decompressing said compressed speech data;

calculating at least one recognition feature from said at least one set of LPC parameters; and

utilizing said at least one recognition feature and at least one previously stored recognition feature to recognize the spoken word.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A vocoder based voice recognizer recognizes a spoken word using linear prediction coding (LPC) based, vocoder data without completely reconstructing the voice data. The recognizer generates at least one energy estimate per frame of the vocoder data and searches for word boundaries in the vocoder data using the associated energy estimates. If a word is found, the LPC word parameters are extracted from the vocoder data associated with the word and recognition features are calculated from the extracted LPC word parameters. Finally, the recognition features are matched with previously stored recognition features of other words, thereby to recognize the spoken word.

Citations

26 Claims

1. A method for recognizing a spoken word, the method comprising the steps of:
- receiving compressed speech data that has been compressed using linear prediction coding (LPC) techniques;
  
  extracting at least one set of LPC parameters from said compressed speech data without completely decompressing said compressed speech data;
  
  calculating at least one recognition feature from said at least one set of LPC parameters; and
  
  utilizing said at least one recognition feature and at least one previously stored recognition feature to recognize the spoken word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A method according to claim 1 and wherein said compressed speech data is of the type produced by any of the following vocoders:
    - Regular Pulse Excitation-Long Term Prediction (RPE-LTP) full and half rate, Qualcomm Code Excited Linear Prediction (QCELP) 8 and 13 Kbps, Enhanced Variable Rate Codec (EVRC), Low Delay Code Excited Linear Prediction (LD CELP), Vector Sum Excited Linear Prediction (VSELP), Conjugate Structure Algebraic Code Excited Linear Prediction (CS ACELP), Enhanced Full Rate Vocoder and 10 Linear Prediction Coefficients (LPC10).
  - 3. A method according to claim 1, the method further comprising the steps of:
    - generating at least one energy estimate for at least a portion of said compressed speech data without completely decompressing said compressed speech data; and
      
      searching for at least two word boundaries in said compressed speech data in accordance with said at least one energy estimate, and said at least one set of LPC parameters is from an area between said at least two boundaries.
  - 4. A method according to claim 3, wherein said portion includes at least one frame.
  - 5. A method according to claim 4 and wherein said step of generating comprises the step of estimating the energy from residual data found in said compressed speech data.
  - 6. A method according to claim 5 and wherein said step of estimating comprises the steps of reconstructing residual data from said compressed speech data and generating the norm of said residual data.
  - 7. A method according to claim 5 and wherein said step of estimating comprises the steps of extracting a pitch-gain value from said compressed speech data and using said extracted pitch-gain value as said energy estimate.
  - 8. A method according to claim 5 and wherein said step of generating comprises the steps of:
    - extracting pitch-gain values, lag values and remnant data from said compressed speech data;
      
      reconstructing a remnant signal from said remnant data;
      
      generating an energy estimate of said remnant signal;
      
      generating an energy estimate of a non-remnant portion of said residual by using said pitch-gain value and a previous energy estimate defined by said lag value; and
      
      combining said remnant and non-remnant energy estimates.

9. A method for preparing to recognize a spoken word, the method comprising the steps of:
- receiving compressed speech data that has been compressed using linear prediction coding (LPC) techniques;
  
  extracting at least one set of LPC parameters from said compressed speech data without completely decompressing said compressed speech data; and
  
  calculating at least one recognition feature from said at least one set of LPC parameters.
- View Dependent Claims (10, 11, 12)
- - 10. A method according to claim 9 and wherein said compressed speech data is of the type produced by any of the following vocoders:
    - RPE-LTP full and half rate, QCELP 8 and 13 Kbps, EVRC, LD CELP, VSELP, CS ACELP, Enhanced Full Rate Vocoder and LPC 10.
  - 11. A method according to claim 9, the method further comprising the steps of:
    - generating at least one energy estimate for at least a portion of said compressed speech data without completely decompressing said compressed speech data; and
      
      searching for at least two word boundaries in said compressed speech data in accordance with said at least one energy estimate, and said at least one set of LPC parameters is from an area between said at least two boundaries.
  - 12. A method according to claim 11, wherein said portion includes at least one frame.

13. A digital cellular telephone comprising:
- a mobile telephone operating system;
  
  a vocoder which compresses a voice signal using at least linear prediction coding (LPC) thereby to produce compressed speech data; and
  
  a speech recognizer comprising;
  
  a front end processor, operating on said compressed speech data without completely decompressing said compressed speech data, which determines when a word was spoken and generates recognition features of said spoken word; and
  
  a recognition unit which at least recognizes said spoken word as one of a set of reference words.
- View Dependent Claims (14, 15, 16)
- - 14. A digital cellular telephone according to claim 13 and wherein said front end processor includes:
    - an energy estimator which uses residual information forming part of said compressed speech data to estimate the energy of a voice signal;
      
      an LPC parameter extractor which extracts the LPC parameters of said compressed speech data; and
      
      a recognition feature generator which generates said recognition features from said LPC parameters.
  - 15. A cellular telephone according to claim 13 and wherein said vocoder is any of the following vocoders:
    - Regular Pulse Excitation-Long Term Prediction (RPE-LTP) full and half rate, Qualcomm Code Excited Linear Prediction (QCELP) 8 and 13 Kbps, Enhanced Variable Rate Codec (EVRC), Low Delay Code Excited Linear Prediction (LD CELP), Vector Sum Excited Linear Prediction (VSELP), Conjugate Structure Algebraic Code Excited Linear Prediction (CS ACELP), Enhanced Full Rate Vocoder and 10 Linear Prediction Coefficients (LPC10).
  - 16. A digital cellular telephone according to claim 13, wherein said recognition unit further comprises a training unit for storing recognition features of said set of reference words.

17. A speech recognizer operable with compressed speech data which has been compressed using linear prediction coding (LPC) techniques by a vocoder, the speech recognizer comprising:
- a front end processor which processes said compressed speech data without completely decompressing said compressed speech data to determine when a word was spoken and to generate recognition features of said spoken word; and
  
  a recognition unit which at least recognizes said spoken word as one of a set of reference words.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. A speech recognizer according to claim 17 and wherein said front end processor comprises:
    - an energy estimator which uses residual information forming part of said compressed speech data to estimate the energy of a voice signal;
      
      an LPC parameter extractor which extracts the LPC parameters of said compressed speech data; and
      
      a recognition feature generator which generates said recognition features from said LPC parameters.
  - 19. A speech recognizer according to claim 18 and wherein said energy estimator comprises a residual energy estimator which estimates the energy from residual data found in said compressed speech data.
  - 20. A speech recognizer according to claim 19 and wherein said residual energy estimator comprises a residual reconstructor which reconstructs residual data from said compressed speech data and a norm generator which generates the norm of said residual data thereby to produce said energy estimate.
  - 21. A speech recognizer according to claim 19 and wherein said residual energy estimator comprises an extractor which extracts a pitch-gain value from said compressed speech data thereby to produce said energy estimate.
  - 22. A speech recognizer according to claim 19 and wherein said residual energy estimator comprises:
    - an extractor which extracts pitch-gain values, lag values and remnant data from said compressed speech data;
      
      a reconstructor which reconstructs a remnant signal from said remnant data;
      
      a remnant energy estimator which generates an energy estimate of said remnant signal;
      
      a non-remnant energy estimator which generates an energy estimate of a non-remnant portion of said residual by using said pitch-gain value and a previous energy estimate defined by said lag value; and
      
      a combiner which combines said remnant and non-remnant energy estimates thereby to produce said energy estimate.
  - 23. A speech recognizer according to claim 17 and wherein said vocoder is any of the following vocoders:
    - Regular Pulse Excitation-Long Term Prediction (RPE-LTP) full and half rate, Qualcomm Code Excited Linear Prediction (QCELP) 8 and 13 Kbps, Enhanced Variable Rate Codec (EVRC), Low Delay Code Excited Linear Prediction (LD CELP), Vector Sum Excited Linear Prediction (VSELP), Conjugate Structure Algebraic Code Excited Linear Prediction (CS ACELP), Enhanced Full Rate Vocoder and 10 Linear Prediction Coefficients (LPC10).
  - 24. A speech recognizer according to claim 17, wherein said recognition unit further comprises a training unit for storing recognition features of said set of reference words.

25. A digital cellular telephone comprising:
- a mobile telephone operating system;
  
  a plurality of vocoders which compress a voice signal using at least linear prediction coding (LPC) thereby to produce compressed speech data, each vocoder operable with one of a corresponding plurality of vocoder types; and
  
  a speech recognizer comprising;
  
  a corresponding plurality of front end processors, one for each of said vocoder types, each said processor operable on said compressed speech data without completely decompressing said compressed speech data, which determine when a word was spoken and generate recognition features of said spoken word; and
  
  a recognition unit which at least recognizes said spoken word as one of a set of reference words.
- View Dependent Claims (26)
- - 26. A digital cellular telephone according to claim 25, wherein said recognition unit further comprises a training unit for storing recognition features of said set of reference words.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Advanced Recognition Technologies, Inc.
Original Assignee
Advanced Recognition Technologies, Inc.
Inventors
Ilan, Gabriel, Hershkovits, Yehudah
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/002,616
Time in Patent Office

705 Days
Field of Search

704/270, 704/251, 704/253, 704/255, 704/219
US Class Current

704/253
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/30   Distributed recognition, e....

G10L 19/04   using predictive techniques

G10L 2025/783   based on threshold decision

Speech recognition method and system using compressed speech data

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition method and system using compressed speech data

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links