Method for robust voice recognition by analyzing redundant features of source signal

US 6,957,183 B2
Filed: 03/20/2002
Issued: 10/18/2005
Est. Priority Date: 03/20/2002
Status: Active Grant

First Claim

Patent Images

1. A method of processing speech signals comprising:

applying a primary transformation to a digital input speech signal to extract primary features therefrom;

applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;

applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and

generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition. A primary transformation is applied to a source speech signal to extract primary features therefrom. Each of at least one secondary transformation is applied to the source speech signal or extracted primary features to yield at least one set of secondary features statistically dependant on the primary features. At least one predetermined function is then applied to combine the primary features with the secondary features. A recognition answer is generated by pattern matching this combination against predetermined voice recognition templates.

Citations

41 Claims

1. A method of processing speech signals comprising:
- applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, where the primary transformation comprises a spectral transformation.
  - 3. The method of claim 1, where the primary transformation comprises production of a time-frequency representation of the input speech signal.
  - 4. The method of claim 1, where the primary transformation comprises a spectral transformation and the secondary transformation comprises a cepstral transformation.
  - 5. The method of claim 1, further comprising:
    - forming voice recognition templates by performing each of the applying and generating operations to predetermined training signals.
  - 6. The method of claim 5, further comprising:
    - forming the voice recognition templates by performing each of the applying and generating operations to predetermined training signals.

7. A method of processing speech signals comprising:
- applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method of claim 7, where each of the separately modifying operations comprises at least one of the following:
    - scaling, power change, self-multiplying, exponentiation.
  - 9. The method of claim 7, where the primary transformation comprises a spectral transformation.
  - 10. The method of claim 7, where the primary transformation comprises production of a time-frequency representation of the input speech signal.
  - 11. The method of claim 7, where the primary transformation comprises a spectral transformation and the secondary transformation comprises a cepstral transformation.

12. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform operations for processing speech signals, the operations comprising:
- applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependent on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The medium of claim 12, where the primary transformation comprises a spectral transformation.
  - 14. The medium of claim 12, where the primary transformation comprises production of a time-frequency representation of the input speech signal.
  - 15. The medium of claim 12 where the primary transformation comprises a spectral transformation and the secondary transformation comprises a cepstral transformation.
  - 16. The medium of claim 12, further comprising:
    - forming the voice recognition templates by performing each of the applying and generating operations to predetermined training signals.

17. A signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital processing apparatus to perform operations for processing speech signals, the operations comprising:
- applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependent on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates;
  
  where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The medium of claim 17, where each of the separately modifying operations comprises at least one of the following:
    - scaling, power change, self-multiplication, exponentiation.
  - 19. The medium of claim 17, where the primary transformation comprises a spectral transformation.
  - 20. The medium of claim 17, where the primary transformation comprises production of a time-frequency representation of the input speech signal.
  - 21. The medium of claim 17 where the primary transformation comprises a spectral transformation and the secondary transformation comprises a cepstral transformation.
  - 22. The medium of claim 17, further comprising forming the voice recognition templates by performing each of the applying and generating operations to predetermined training signals.

23. Circuitry of multiple interconnected electrically conductive elements configured to perform operations to process speech signals, the operations comprising:
- applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.
- View Dependent Claims (24, 25, 26, 27)
- - 24. The circuitry of claim 23, where the primary transformation comprises a spectral transformation.
  - 25. The circuitry of claim 23, where the primary transformation comprises production of a time-frequency representation of the input speech signal.
  - 26. The circuitry of claim 23, where the primary transformation comprises a spectral transformation and the secondary transformation comprises a cepstral transformation.
  - 27. The circuitry of claim 23, further comprising:
    - forming the voice recognition templates by performing each of the applying and generating operations to predetermined training signals.

28. Circuitry of multiple interconnected electrically conductive elements configured to perform operations to process speech signals, the operations comprising:
- applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features;
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates; and
  
  where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.
- View Dependent Claims (29, 30, 31, 32, 33)
- - 29. The circuitry of claim 28, where each of the separately modifying operations comprises at least one of the following:
    - scaling, power change, self-multiplication, exponentiation.
  - 30. The circuitry of claim 28, where the primary transformation comprises a spectral transformation.
  - 31. The circuitry of claim 28, where the primary transformation comprises production of a time-frequency representation of the input speech signal.
  - 32. The circuitry of claim 28, where the primary transformation comprises a spectral transformation and the secondary transformation comprises a cepstral transformation.
  - 33. The circuitry of claim 28, further comprising forming tire voice recognition templates by performing each of the applying and generating operations to predetermined training signals.

34. A voice recognition system comprising:
- a primary feature extractor applying a primary function to extract primary features from a digital input speech signal;
  
  at least one secondary transformation module each producing secondary features statistically dependent on the primary features by applying a secondary function to an input comprising one of the following;
  
  the input speech signal, the primary features;
  
  a feature combination module coupled to the primary feature extractor and each of the secondary transformation modules to apply one or more predetermined functions to combine the primary features with the secondary features forming a combined signal; and
  
  a statistical modeling engine, coupled to the feature combination module to generate a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.

35. A voice recognition system comprising:
- primary feature extractor means for applying a primary function to extract primary features from a digital input speech signal;
  
  secondary transformation means for producing secondary features statistically dependent on the primary features by applying at least one secondary function to an input comprising one of the following;
  
  the input speech signal, the primary features;
  
  feature combination means for applying one or more predetermined functions to combine the primary features with the secondary features forming a combined signal; and
  
  statistical modeling means for generating a recognition answer by pattern matching the combined features against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear disriminant analysis, principal component transfer, and concatenation.

36. A wireless communications device having:
- a transceiver coupled to an antenna;
  
  a speaker;
  
  a microphone;
  
  a user interface;
  
  a manager coupled to components including the transceiver, speaker, microphone, and user interface to manage operation of the components, the manager including a voice recognition system configured to perform operations comprising;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.

37. A wireless communications device having:
- a transceiver coupled to an antenna, a speaker;
  
  a microphone;
  
  a user interface;
  
  means for managing operation of the transceiver, speaker, microphone, and user interface, and for performing voice recognition by;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, the at least one predetermined function utilizing at least one of linear discriminant analysis, principal component transfer, and concatenation.

38. A voice recognition system comprising:
- a primary feature extractor applying a primary function to extract primary features from a digital input speech signal;
  
  at least one secondary transformation module each producing secondary features statistically dependent on the primary features by applying a secondary function to an input comprising one of the following;
  
  the input speech signal, the primary features;
  
  a feature combination module coupled to the primary feature extractor and each of the secondary transformation modules to apply one or more predetermined functions to combine the primary features with the secondary features fanning a combined signal;
  
  a statistical modeling engine, coupled to the feature combination module to generate a recognition answer by pattern matching the combined signal against predetermined voice recognition templates;
  
  the predetermined function utilizing at least one of the following to combine the primary features and the secondary features;
  
  linear disriminant analysis, principal component transfer, concatenation;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.

39. A voice recognition system comprising:
- primary feature extractor means for applying a primary function to extract primary features front a digital input speech signal;
  
  secondary transformation means for producing secondary features statistically dependent on the primary features by applying at least one secondary function to an input comprising one of the following;
  
  the input speech signal, the primary features;
  
  feature combination means for applying one or more predetermined functions to combine the primary features with the secondary features forming a combined signal;
  
  statistical modeling means for generating a recognition answer by pattern matching the combined features against predetermined voice recognition templates;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.

40. A wireless communications device having:
- a transceiver coupled to an antenna;
  
  a speaker;
  
  a microphone;
  
  a user interface;
  
  a manager coupled to components including the transceiver, speaker, microphone, and the user interface to manage operation of the components, and a voice recognition system configured to perform operations comprising;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features;
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates, where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.

41. A wireless communications device having:
- a transceiver coupled to an antenna;
  
  a speaker;
  
  a microphone;
  
  a user interface;
  
  means for managing operation of the transceiver, speaker, microphone, and user interface, and for performing voice recognition by;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features;
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates;
  
  applying a primary transformation to a digital input speech signal to extract primary features therefrom;
  
  applying each of at least one secondary transformation to one of the input speech signal and the primary features to yield secondary features statistically dependant on the primary features;
  
  applying at least one predetermined function to form a combined signal comprising a combination of the primary features with the secondary features; and
  
  generating a recognition answer by pattern matching the combined signal against predetermined voice recognition templates;
  
  where the at least one predetermined function utilizes at least one of linear discriminant analysis, principal component transfer, and concatenation; and
  
  separately modifies at least one of the primary features and the secondary features, the at least one predetermined function being used to form a combined signal comprising a combination of the primary features including any modifications with the secondary features including any modifications.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Malayath, Narendranath, Garudadri, Harinath
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US10/104,178
Publication Number

US 20030182115A1
Time in Patent Office

1,308 Days
Field of Search

704/236, 704/243, 704/246
US Class Current

704/246
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 15/20 Speech recognition techniqu...

Method for robust voice recognition by analyzing redundant features of source signal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

41 Claims

Specification

Solutions

Use Cases

Quick Links

Method for robust voice recognition by analyzing redundant features of source signal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

41 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links