Gaussian model-based dynamic time warping system and method for speech processing

US 20040122672A1
Filed: 12/18/2002
Published: 06/24/2004
Est. Priority Date: 12/18/2002
Status: Abandoned Application

First Claim

Patent Images

1. A method for constructing a speech model, comprising:

constructing an acoustic space model from a plurality of utterances obtained from a plurality of speakers;

constructing a speaker model by adapting the acoustic space model using enrollment speech from at least one speaker;

identifying a temporal structure associated with said enrollment speech; and

constructing a speech model based on said speaker model and on the enrollment speech while preserving the temporal structure of said enrollment speech in said speech model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The Gaussian Dynamic Time Warping model provides a hierarchical statistical model for representing an acoustic pattern. The first layer of the model represents the general acoustic space; the second layer represents each speaker space and the third layer represents the temporal structure information contained in each enrollment speech utterance, based on equally-spaced time intervals. These three layers are hierarchically developed: the second layer is derived from the first, and the third layer is derived from the second. The model is useful in speech processing application, particularly in applications such as word and speaker recognition, using a spotting recognition mode.

30 Citations

View as Search Results

33 Claims

1. A method for constructing a speech model, comprising:
- constructing an acoustic space model from a plurality of utterances obtained from a plurality of speakers;
  
  constructing a speaker model by adapting the acoustic space model using enrollment speech from at least one speaker;
  
  identifying a temporal structure associated with said enrollment speech; and
  
  constructing a speech model based on said speaker model and on the enrollment speech while preserving the temporal structure of said enrollment speech in said speech model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the temporal structure of said enrollment speech is preserved in said speech model by constructing a set of frame dependent models that are mapped to a set of frames.
  - 3. The method of claim 2 wherein said set of frames has an associated timing reference that is established from and directly preserves the timing of said enrollment speech.
  - 4. The method of claim 1 wherein said acoustic space model, said speaker model and said temporal structure share a common hierarchical relationship.
  - 5. The method of claim 1 wherein said acoustic space model is constructed by statistical modeling.
  - 6. The method of claim 1 wherein said acoustic space model is constructed by obtaining speech from a plurality of speakers, extracting features from said obtained speech and representing said extracted features as Gaussian parameters.
  - 7. The method of claim 1 wherein said acoustic space model is represented using a Hidden Markov Model.
  - 8. The method of claim 1 wherein said acoustic space model is represented using a Gaussian Mixture Model.
  - 9. The method of claim 1 wherein said speaker model is constructed by statistical modeling and wherein the step of adapting the acoustic space model is performed by maximum a posteriori adaptation.
  - 10. The method of claim 1 wherein said temporal structure information model is constructed by statistical modeling using said speaker model and said acoustic space model for a plurality of enrollment speech utterances.
  - 11. The method of claim 10 wherein said temporal structure information model is further built by constructing a temporal structure information model for each of a plurality of enrollment speech utterances and then by selecting the best temporal structure information model.
  - 12. The method of claim 10 further comprising adapting said temporal structure information models based on said enrollment speech utterances.

13. A method for constructing a speech model, comprising:
- constructing an acoustic space model from a plurality of utterances obtained from a plurality of speakers;
  
  constructing a speaker model by adapting the acoustic space model using enrollment speech from at least one speaker;
  
  constructing a temporal structure information model by representing said speaker model as a plurality of frame dependent models that correspond to sequential time intervals associated with said enrollment speech; and
  
  constructing said speech model by adapting the temporal structure information model using said enrollment speech, said speaker model and said acoustic space model.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13 further comprising representing said acoustic space model as a plurality of Gaussian parameters.
  - 15. The method of claim 13 further comprising representing said acoustic space model as a plurality of parameters that include Gaussian mean parameters and wherein said step of adapting the acoustic space model is performed by adapting said Gaussian mean parameters.
  - 16. The method of claim 13 further comprising representing said acoustic space model as a plurality of parameters that include Gaussian weight parameters and wherein said step of adapting the temporal model is performed by adapting said Gaussian weight parameters.
  - 17. The method of claim 13 wherein said temporal model is further constructed by obtaining plural instances of enrollment speech from at least one single speaker and constructing a frame-based temporal structure information model

18. A hierarchical speech model comprising:
- a first layer for representing an acoustic space;
  
  a second layer for representing a speaker space;
  
  a third layer for representing temporal structure of enrollment speech according to a predetermined frame structure.
- View Dependent Claims (19, 20, 21, 22, 23, 24)
- - 19. The speech model of claim 18 wherein said first layer is a set of Gaussian model parameters.
  - 20. The speech model of claim 18 wherein said second layer is a set of Gaussian model mean parameters.
  - 21. The speech model of claim 18 wherein said third layer is a set of Gaussian model weight parameters.
  - 22. The speech model of claim 18 wherein said second layer is hierarchically related to said first layer.
  - 23. The speech model of claim 18 wherein said third layer is hierarchically related to said second layer.
  - 24. The speech model of claim 23 wherein said third layer is related to said second layer based on an adaptation factor for tuning the degree of influence between said third layer and said second layer.

25. A speech processing system comprising:
- a speech recognizer having a set of probabilistic models against which an input speech utterance is tested;
  
  said set of probabilistic models being configured to contain;
  
  a first layer for representing an acoustic space;
  
  a second layer for representing a speaker space;
  
  a third layer for representing temporal structure of speech according to a predetermined frame structure.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
- - 26. The speech processing system of claim 25 wherein said set of probabilistic models stores an enrollment utterance and said speech recognizer performs a word spotting function.
  - 27. The speech processing system of claim 25 wherein said set of probabilistic models stores an enrollment utterance and said speech recognizer performs a speaker recognition function.
  - 28. The speech model of claim 25 wherein said first layer is a set of Gaussian model parameters.
  - 29. The speech model of claim 25 wherein said second layer is a set of Gaussian mean parameters.
  - 30. The speech model of claim 25 wherein said third layer is a set of Gaussian weight parameters.
  - 31. The speech model of claim 25 wherein said second layer is hierarchically related to said first layer.
  - 32. The speech model of claim 25 wherein said third layer is hierarchically related to said second layer.
  - 33. The speech model of claim 32 wherein said third layer is related to said second layer based on an adaptation factor for tuning the degree of influence between said third layer and said second layer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Bonastre, Jean-Francois, Junqua, Jean-Claude, Morin, Philippe

Application Number

US10/323,152
Publication Number

US 20040122672A1
Time in Patent Office

Days
Field of Search
US Class Current

704/256
CPC Class Codes

G10L 15/063 Training

G10L 15/12 using dynamic programming t...

Gaussian model-based dynamic time warping system and method for speech processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

30 Citations

33 Claims

Specification

Use Cases

Quick Links

Others

Gaussian model-based dynamic time warping system and method for speech processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

30 Citations

33 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others