Method and configuration for forming classes for a language model based on linguistic classes
First Claim
Patent Images
1. A method for forming classes for a language model based on linguistic classes Using a computer, which comprises the steps of:
- using a first mapping rule to determine N classes using a prescribed vocabulary with associated linguistic properties;
determining K classes from the N classes by minimizing a language model entropy, including;
determining a number M of most probable of the N classes as base classes; and
merging one of remaining classes (N−
M) of the classes with one of the base classes for which the language model entropy is minimized; and
using the K classes to represent a second mapping rule for forming the classes of language models onto the linguistic classes.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and a configuration for forming classes for a language model based on linguistic classes is described. In order to determine a language model, classes are formed which are based on linguistic classes and minimize a language model entropy. A superset of classes can be prescribed as exemplary text or as an additional language model.
33 Citations
20 Claims
-
1. A method for forming classes for a language model based on linguistic classes Using a computer, which comprises the steps of:
-
using a first mapping rule to determine N classes using a prescribed vocabulary with associated linguistic properties;
determining K classes from the N classes by minimizing a language model entropy, including;
determining a number M of most probable of the N classes as base classes; and
merging one of remaining classes (N−
M) of the classes with one of the base classes for which the language model entropy is minimized; and
using the K classes to represent a second mapping rule for forming the classes of language models onto the linguistic classes.
-
-
2. A method for forming classes for a language model based on linguistic classes using a computer, which comprises the steps of:
-
using a first mapping rule to determine N classes using a prescribed vocabulary with associated linguistic properties;
determining K classes from the N classes by minimizing a language model entropy;
using the K classes to represent a second mapping rule for forming the classes of language models onto the linguistic classes; and
determining the language model entropy by use of equation
whereH(LM) denotes the language model entropy of the language model, n denotes a number of words in a text, W denotes a chain of the words w0, w1, . . . , wn, and P(W) denotes a probability of an occurrence of a sequence of at least two of the words. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10)
-
-
4. The method according to claim 3, which comprises using a predetermined basic language model to determine for the text the probability P(Ci|Ci−
- 1) of the text by taking over the probability P(Ci|Ci−
1) from the basic language model.
- 1) of the text by taking over the probability P(Ci|Ci−
-
5. The method according to claim 4, which comprises determining the conditional word probability P(wi|Ci) according to at least one of the following possibilities:
-
determining the conditional word probability P(wi|Ci) with an aid of the text;
determining the conditional word probability P(wi|Ci) for the word wi with an aid of a prescribed probability P(wi); and
determining the conditional word probability P(wi|Ci) by using a word list.
-
-
6. The method according to claim 5, which comprises using the conditional word probability P(wi|Ci) determined to adapt the basic language model.
-
7. The method according tom claim 5, which comprises using the conditional word probability P(wi|Ci) to determine a probability P(Ci|wi) as follows:
-
8. The method according to claim 3, which comprises detecting an appropriate sequence of at least one word when the probability P(W) of the occurrence of the sequence of at least one word is above a prescribed bound, otherwise a prescribed action is carried out.
-
9. The method according to claim 8, which comprises performing the prescribed action by outputting one of an error message and a prompt to stop operating.
-
10. The method according to claim 4, wherein the text relates to a prescribed application field.
-
11. A. method for forming classes for a language model based on linguistic classes using a computer, which comprises the steps of:
-
using a first mapping rule to prescribe N classes;
determining IC classes from the N classes by minimizing a language model entropy, including;
determining a number M of most probable of the N classes as base classes; and
merging one of remaining classes (N−
M) of the classes with one of the base classes for which the language model entropy is minimized; and
using the K classes to represent a second mapping rule for forming the classes of language models onto the linguistic classes.
-
-
12. A method for forming classes for a language model based on linguistic classes Using a computer, which comprises the steps of:
-
using a first mapping rule to prescribe N classes;
determining K classes from the N classes by minimizing a language model entropy;
using the K classes to represent a second mapping rule for forming the classes of language models onto the linguistic classes; and
determining the language model entropy by use of equation
whereH(LM) denotes the language model entropy of the language model, n denotes a number of words in a texts W denotes a chain of the words w0, w1, . . . , wn, and P(W) denotes a probability of an occurrence of a sequence of at least two of the words. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
wherein a language has the linguistic classes:
-
-
14. The method according to claim 13, which comprises using a predetermined basic language model to determine for the text the probability P(Ci|Ci−
- 1) of the text by taking over the probability P(Ci|Ci−
1) from the basic language model.
- 1) of the text by taking over the probability P(Ci|Ci−
-
15. The method according to claim 14, which comprises determining the conditional word probability P(wi|Ci) according to at least one of the following possibilities:
-
determining the conditional word probability P(wi|Ci) with an aid of the text;
determining the conditional word probability P(wi|Ci) for the word wi with an aid of a prescribed probability P(wi); and
determining the conditional word probability P(wi|Ci) by using a word list.
-
-
16. The method according to claim 15, which comprises using the conditional word probability P(wi|Ci) determined to adapt the basic language model.
-
17. The method according tom claim 15, which comprises using the conditional word probability P(wi|Ci) to determine a probability P(Ci|wi) as follows:
-
18. The method according to claim 13, which comprises detecting an appropriate sequence of at least one word when the probability P(W) of the occurrence of the sequence of at least one word is above a prescribed bound, otherwise a prescribed action is carried out.
-
19. The method according to claim 18, which comprises performing the prescribed action by outputting one of an error message and a prompt to stop operating.
-
20. The method according to claim 14, wherein the text relates to a prescribed application field.
Specification