SPEECH PROCESSOR, A SPEECH PROCESSING METHOD AND A METHOD OF TRAINING A SPEECH PROCESSOR
First Claim
Patent Images
1. A speech recognition method, said method comprising:
- receiving a speech input from a speaker which comprises a sequence of observations; and
determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, said acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, said acoustic model having been trained using first training data and adapted using second training data to said speaker,the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; and
combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said decision trees is based on second training data.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition method, the method involving:
- receiving a speech input from a known speaker of a sequence of observations; and
- determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, the acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, the acoustic model having been trained using first training data and adapted using second training data to said speaker,
- the speech recognition method also determining the likelihood of a sequence of observations occurring in a given language using a language model; and
- combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said decision trees is based on second training data.
59 Citations
20 Claims
-
1. A speech recognition method, said method comprising:
-
receiving a speech input from a speaker which comprises a sequence of observations; and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, said acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, said acoustic model having been trained using first training data and adapted using second training data to said speaker, the speech recognition method further comprising determining the likelihood of a sequence of observations occurring in a given language using a language model; and combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said decision trees is based on second training data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 17)
-
-
9. A text to speech processing method, said method comprising:
-
receiving a text input which comprises a sequence of words; and determining the likelihood of a sequence of speech vectors arising from the sequence of words using an acoustic model, said acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, said acoustic model having been trained using first training data and adapted using second training data to said speaker, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said decision trees is based on second training data.
-
-
10. A method of training an acoustic model for a speech processing system, the method comprising:
-
receiving first training data, said first training data comprising speech and text corresponding to said speech; training a first acoustic model using said first training data; receiving second training data from a known speaker; adapting said first acoustic model to form a second acoustic model using said second training data, wherein adapting said first model to form said second model comprises constructing decision trees to model context dependency, and wherein the structure of the decision trees is based on the second training data. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
18. A speech recognition apparatus comprising:
-
a receiver for receiving a speech input from a speaker which comprises a sequence of observations; and
a processor configured to;determine the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, said acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, said acoustic model having been trained using first training data and adapted using second training data to said speaker; determine the likelihood of a sequence of observations occurring in a given language using a language model; and combine the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said decision trees is based on second training data. - View Dependent Claims (20)
-
-
19. A text to speech system comprising:
-
A receiver for receiving a text input which comprises a sequence of words; and
a processor, said processor being configured to;determine the likelihood of a sequence of speech vectors arising from the sequence of words using an acoustic model, said acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, said acoustic model having been trained using first training data and adapted using second training data to said speaker, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said decision trees is based on second training data.
-
Specification