Type-based selection of rules for semantically disambiguating words
First Claim
Patent Images
1. A method of semantically disambiguating words using rules, the rules including rules derived from two or more types of information in a corpus, rules derived from at least two of the types being applicable to words occurring in specified contexts;
- the method comprising;
(A) obtaining context information about a context in which a semantically ambiguous word occurs in an input text;
a first rule derived from a first type of corpus information and a second rule derived from a second type of corpus information both being applicable to words occurring in the context; and
(B) selecting the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a selection order based on the types of corpus information from which the rules are derived.
7 Assignments
0 Petitions
Accused Products
Abstract
In semantically disambiguating words, where more than one disambiguation applies to the context in which a word occurs, a rule can be selected based on the type of information from which it was obtained. The rules can be derived from different types of information in a corpus such as a dictionary, and rules can be selected in accordance with a prioritization of the types of information.
187 Citations
15 Claims
-
1. A method of semantically disambiguating words using rules, the rules including rules derived from two or more types of information in a corpus, rules derived from at least two of the types being applicable to words occurring in specified contexts;
-
the method comprising;
(A) obtaining context information about a context in which a semantically ambiguous word occurs in an input text;
a first rule derived from a first type of corpus information and a second rule derived from a second type of corpus information both being applicable to words occurring in the context; and
(B) selecting the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a selection order based on the types of corpus information from which the rules are derived. - View Dependent Claims (2, 3, 4)
-
-
5. A method of semantically disambiguating words using rules, the rules including rules derived from two or more types of information in a corpus, rules derived from at least two of the types being applicable to words occurring in specified contexts, each of the rules in first and second types includes a context descriptor specifying contexts in which the rule is applicable, each context descriptor includes two or more word descriptors and a relation descriptor specifying a type of relation in which words that satisfy the word descriptors can occur;
-
the method comprising;
(A) using the input text to obtain relation information about a set of relations between the semantically ambiguous word and other words in the input text and, for each relation, word information about words that occur in the relation in the input text;
the context information including the relation information and the word information; and
(B) for each of the rules comparing the rule'"'"'s context descriptor with the context information and obtaining match information indicating that the context descriptors of the first and second rules are both satisfied by a relation between the semantically ambiguous word and other words in the input text;
in response to the match information, comparing the types of corpus information from which the first and second rules were derived and determining that the first type of corpus information has higher priority than the second type of corpus information; and
selecting to disambiguate the semantically ambiguous word using the first rule rather than the second rule based on the determination that the first type of corpus information has higher priority. - View Dependent Claims (6)
-
-
7. A method of semantically disambiguating words using rules, the rules including rules derived from two or more types of information in a corpus, rules derived from at least two of the types being applicable to words occurring in specified contexts, each of the rules in first and second types includes a context descriptor specifying contexts in which the rule is applicable, each context descriptor includes two or more word descriptors and a relation descriptor specifying a type of relation in which words that satisfy the word descriptors can occur;
-
the method comprising;
(A) using the input text to obtain relation information about a set of relations between the semantically ambiguous word and other words in the input text and, for each relation, word information about words that occur in the relation in the input text;
the context information including relation information and the word information; and
(B) for each of the rules, comparing the rule'"'"'s context descriptor with the context information and obtaining match information indicating that the context descriptors of the first and second rules and a third rule are all satisfied by a relation between the semantically ambiguous word and other words in the input text;
the third rule also being derived from the first type of corpus information; and
in response to the match information, comparing the word descriptors of the first and third rules and determining that the first rule'"'"'s word descriptors are more specific than the third rule'"'"'s word descriptors; and
selecting to disambiguate the semantically ambiguous word using the first rule rather than the third rule based on the determination that the first rule'"'"'s word descriptor are more specific.
-
-
8. A machine for semantically disambiguating words;
- the machine comprising;
an input text;
a set of rules for use in semantically disambiguating words, the set of rules including rules derived from two or more types of information in a corpus, each of the rules derived from at least two of the types of corpus information being applicable to words occurring in specified context; and
a processor connected for accessing the input text and the set of rules;
the processor, in using the set of rules to disambiguate words in the input text, operating to;
obtain context information about the context in which a semantically ambiguous word occurs in the input text;
a first rule derived from a first type of corpus information and a second rule derived from a second type of corpus information both being applicable to words occurring in context; and
select the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a selection order based on the types of corpus information from which the rules are derived.
- the machine comprising;
-
9. A stored rule set for use by a processor in semantically disambiguating words in text;
- the stored rule set comprising;
a storage medium; and
rules data stored by the storage medium;
the rules data being accessible by the processor to obtain a set of semantic disambiguation rules derived from information in a corpus, the rules being useable by the processor in disambiguating a semantically ambiguous word that occurs in a context in an input text;
the corpus including two or more types of information, rules derived from at least first and second types of corpus information being applicable to words occurring in specified contexts, a first rule derived from the first type of corpus information and a second rule derived from the second type of corpus information both being applicable to words occurring in the context in which the semantically ambiguous word occurs in the input text;
the rules data further being accessible by the processor to obtain, for each of a set of the rules, a type of corpus information from which the rule was derived, for use in selecting the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a selection order based on the types of corpus information from which the rules are derived.- View Dependent Claims (10, 11, 12, 13)
- the stored rule set comprising;
-
14. A method of operating a first machine to transfer data to a second machine over a network, the second machine including a memory, an input text, and a processor connected for accessing the memory and for accessing the input text;
- the method comprising;
establishing a connection between the first and second machines over the network; and
operating the first machine to transfer rules data to the memory of the second machine;
the rules data being accessible by the processor to obtain a set of semantic disambiguation rules derived from information in a corpus, the rules being useable by the processor in disambiguating a semantically ambiguous word that occurs in a context in the input text;
the corpus including two or more types of information, rules derived from at least first and second types of corpus information being applicable to words occurring in specified contexts, a first rule derived from the first type of corpus information and a second rule derived from the second type of corpus information both being applicable to words occurring in the context in which the semantically ambiguous words occurs in the input text;
the rules data further being accessible by the processor to obtain, for each of a set of rules, a type of corpus information from which the rule was derived, for use in selecting the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a selection order based on the types of corpus information from which the rules are derived.
- the method comprising;
-
15. A method of semantically disambiguating words using rules, the rules including rules derived from two or more types of information in a corpus, rules derived from at least two of the types being applicable to words occurring in specified contexts;
-
the method comprising;
(A) obtaining context information about a context in which a semantically ambiguous word occurs in an input text;
a first rule derived from a first type of corpus information and a second rule derived from a second type of corpus information both being applicable to words occurring in the context; and
(B) first selecting the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a selection order based on the types of corpus information from which the rules are derived, but where the first selecting of the first rule according to a selection order based on the types of corpus information is not applicable for selection, then second selecting the first rule rather than the second rule to disambiguate the semantically ambiguous word according to a distance-based selection rule for selection.
-
Specification