Method for the automatic determination of context-dependent hidden word distributions
First Claim
1. A method for determining a probabilistic, context dependent word distribution for each word in a previously unseen text, the method comprising:
- in a training phase, learning for each word of a large corpus of natural language texts a probabilistic context model that describes the context these words typically occur in and learning a hidden-to-observed distribution that that describes words with similar meaning and usage;
storing the context model and the hidden-to-observed distribution on a storage device; and
in an inference phase, retrieving the context model and the hidden-to-observed distribution from the storage device and for each word in the previously unseen text determining the probabilistic, context dependent word distribution utilizing the context model and the hidden-to-observed distribution obtained in the training phase.
1 Assignment
0 Petitions
Accused Products
Abstract
Described is method, the Latent Words Language Model (LWLM), that automatically determines context-dependent word distributions (called hidden or latent words) for each word of a text. The probabilistic word distributions reflect the probability that another word of the vocabulary of a language would occur at that position in the text. Furthermore, a method is described to use these word distributions in statistical language processing applications, such as information extraction applications (for example, semantic role labeling, named entity recognition), automatic machine translation, textual entailment, paraphrasing, information retrieval, and speech recognition.
-
Citations
20 Claims
-
1. A method for determining a probabilistic, context dependent word distribution for each word in a previously unseen text, the method comprising:
-
in a training phase, learning for each word of a large corpus of natural language texts a probabilistic context model that describes the context these words typically occur in and learning a hidden-to-observed distribution that that describes words with similar meaning and usage; storing the context model and the hidden-to-observed distribution on a storage device; and in an inference phase, retrieving the context model and the hidden-to-observed distribution from the storage device and for each word in the previously unseen text determining the probabilistic, context dependent word distribution utilizing the context model and the hidden-to-observed distribution obtained in the training phase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification