Word disambiguation apparatus and methods
First Claim
1. A method performed in a computer system of determining whether a sense for a word is lexically appropriate to a given position in a text accessible to the computer system, the method comprising the steps of:
- using the computer system to obtain a sequence of words in the text which includes the given position and is predominantly substantially longer than the average length of the sentences of the text and to store the sequence; and
using the computer system to make a word sense determination of whether a sense specified in a word/sense pair for the word which is stored in the computer system is a sense which is lexically appropriate to the given position by automatically analyzing the sequence.
7 Assignments
0 Petitions
Accused Products
Abstract
Apparatus and methods for determining whether a word/sense pair is proper for a context. Wide contexts (100 words) are employed for both training and testing, and testing is done by adding the weights of vocabulary words from the context. The weights are determined by Bayesian techniques which interpolate between the probability of occurrence of a vocabulary word in a conditional sample of the training text and the probability of its occurrence in the entire training text. A further improvement in testing takes advantage of the fact that a word is generally used in only a single sense in a single discourse. Also disclosed are automated training techniques including training on bilingual bodies of text and training using categories from Roget'"'"'s Thesaurus.
260 Citations
37 Claims
-
1. A method performed in a computer system of determining whether a sense for a word is lexically appropriate to a given position in a text accessible to the computer system, the method comprising the steps of:
-
using the computer system to obtain a sequence of words in the text which includes the given position and is predominantly substantially longer than the average length of the sentences of the text and to store the sequence; and using the computer system to make a word sense determination of whether a sense specified in a word/sense pair for the word which is stored in the computer system is a sense which is lexically appropriate to the given position by automatically analyzing the sequence. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
2. A method used in a computer system of determining a probability that a sense for a word is lexically appropriate to a given position in a text accessible to the computer system, the method comprising the steps of:
-
using the computer system to obtain a sequence of words in the text which includes the given position and to store the sequence; and using the computer system to make a probability determination of the probability that the sense specified in a word/sense pair for the word is a sense which is lexically appropriate to the given position, the word-sense pair being stored in the computer system and the probability determination being made by employing a Bayesian discrimination technique involving the words in the sequence and a sense specified in the word/sense pair. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A method of using a computer system to determine whether a sense for a word is lexically appropriate to a given position in a text accessible to the computer system, the method comprising the steps of:
-
using the computer system to make a first determination of whether a sense specified in a word/sense pair for the word which is stored in the computer system is a sense which is lexically appropriate to the given position in the text; and using the computer system to make a final determination that the sense in the word/sense pair is lexically appropriate to the given position by determining that the sense in the word sense pair is lexically appropriate to another position in the text. - View Dependent Claims (20, 21)
-
-
22. Apparatus for determining whether a sense for a word is lexically appropriate to a given position in a text, the apparatus comprising:
-
means for obtaining a sequence of words in the text which includes the given position and is predominantly substantially longer than the average length of the sentences of the text; and means for analyzing the sequence to determine whether a sense specified in a word/sense pair for the word is a sense which is lexically appropriate to the given position. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
28. A method of using a computer to make a probability table for use in apparatus for determining whether a sense for a word is lexically appropriate to a given position in a text, the method of making the table comprising the steps of:
-
using the computer system to make and store a conditional sample of a text corpus accessible to the computer system, the conditional sample including contexts from the text corpus which are semantically related to the sense specified in a given word/sense pair; using the computer system to determine for each word which occurs in the conditional sample a weight of that word in the conditional sample with regard to the probability that the word of the given word/sense pair has the sense specified in the given word/sense pair, the determination of the weight being done using a Bayesian technique; and storing a table entry in the computer system which includes the weight of the word for each of the occurring words which has more than a given weight. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
-
-
36. A probability table for use in apparatus for determining whether a sense for a word is lexically appropriate to a given position in a text, the probability table being made by a method comprising the steps of:
-
making a conditional sample of the text corpus which includes contexts which are semantically related to a sense specified in a given word/sense pair; employing a Bayesian technique to determine for each word which occurs in the conditional sample a weight of that word in the conditional sample with regard to the probability that the word of the given word/sense pair has the sense specified in the given word/sense pair; and making a table entry including the weight of the word for each of the occurring words which has more than a given weight. - View Dependent Claims (37)
-
Specification