Method for determining the semantic relatedness of lexical items in a text
First Claim
1. A method of selecting automatically the most appropriate translation, in a given target language, of a given lexical item in a given context in a given source language, comprising the following steps:
- a) parsing, with the aid of a parsing system, an original sentence in which said given lexical item appears, in order to determine a syntactic structure of said sentence;
b) identifying, in said syntactic structure, those contextual relations which said given lexical item has in said sentence including identifying other lexical items in said sentence to which the given lexical item is syntactically related, and the syntactic relations involved;
c) retrieving said given lexical item, together with a set of alternative translations thereof, from a bilingual lexicon stored in electronic form, in which each of said alternative translations is associated with at least one contextual relation of said given lexical item in said given source language, with each said at least one contextual relation comprising a further lexical item and a syntactic relation;
d) comparing each of said contextual relations identified in step b) in said original sentence, with each contextual relation associated with one of said alternative translations retrieved in step c), including comparison of the syntactic relations involved;
e) for each of the comparisons performed in step d) in which said syntactic relations are found to be identical, determining a degree of semantic proximity between the given lexical item involved in the contextual relation identified in step b), and the further lexical item involved in the contextual relation retrieved in step c) by means of the following procedure;
1) identifying in a predefined text corpus in said source language a set of sentences in which at least one of said lexical items appear, and retrieving said set of sentences from said text corpus,2) parsing, with the aid of said parsing system, each of said retrieved sentences in order to determine a syntactic structure of each of said sentences,3) for each said sentence retrieved, determining from the obtained syntactic structure those contextual relations which said lexical items have in that sentence,4) determining, for each of said lexical items, a total number of contextual relations found in step
3),5) determining a number of contextual relations which said lexical items have in common, and6) determining, on the basis of the results obtained in steps
4) and
5), a degree of overlap between the contextual relations of said lexical items, and thereby a degree of semantic proximity, or a degree of similarity, between said lexical items;
f) for each combination of a contextual relation identified in said original sentence in step b), and a contextual relation retrieved from said bilingual lexicon in step c) together with an associated translation, adding the result obtained in step e) to obtain a score representing the appropriateness of that translation; and
g) selecting from said set of alternative translations retrieved in step c) that translation to which the highest score is attached at the conclusion of step f).
1 Assignment
0 Petitions
Accused Products
Abstract
A method for determining the degree to which two or more lexical items belonging to a predefined corpus of text in any given language are semantically related to each other. The method involves
a) the retrieval from the said text corpus of a set of sentences in which one or more of the given two or more lexical items appear,
b) the parsing, with the aid of a suitable parsing system, of each of the sentences retrieved, in order to determine the syntactic dependency structure of each of the said sentences,
c) for each sentence retrieved, determining from the obtained syntactic dependency structure the contextual relations which the given lexical items have in that sentence, i.e. identifying those items in the context which have a syntactic relation to those of the given lexical items which appear in the sentence concerned, together with the syntactic relations involved,
d) determining, for each of the given lexical items, the total number of contextual relations found in step c),
e) determining the number of contextual relations which the given lexical items have in common,
f) determining, on the basis of the results obtained in steps d) and e), the degree of overlap between the contextual patterns of the given two or more lexical items.
-
Citations
2 Claims
-
1. A method of selecting automatically the most appropriate translation, in a given target language, of a given lexical item in a given context in a given source language, comprising the following steps:
-
a) parsing, with the aid of a parsing system, an original sentence in which said given lexical item appears, in order to determine a syntactic structure of said sentence; b) identifying, in said syntactic structure, those contextual relations which said given lexical item has in said sentence including identifying other lexical items in said sentence to which the given lexical item is syntactically related, and the syntactic relations involved; c) retrieving said given lexical item, together with a set of alternative translations thereof, from a bilingual lexicon stored in electronic form, in which each of said alternative translations is associated with at least one contextual relation of said given lexical item in said given source language, with each said at least one contextual relation comprising a further lexical item and a syntactic relation; d) comparing each of said contextual relations identified in step b) in said original sentence, with each contextual relation associated with one of said alternative translations retrieved in step c), including comparison of the syntactic relations involved; e) for each of the comparisons performed in step d) in which said syntactic relations are found to be identical, determining a degree of semantic proximity between the given lexical item involved in the contextual relation identified in step b), and the further lexical item involved in the contextual relation retrieved in step c) by means of the following procedure; 1) identifying in a predefined text corpus in said source language a set of sentences in which at least one of said lexical items appear, and retrieving said set of sentences from said text corpus, 2) parsing, with the aid of said parsing system, each of said retrieved sentences in order to determine a syntactic structure of each of said sentences, 3) for each said sentence retrieved, determining from the obtained syntactic structure those contextual relations which said lexical items have in that sentence, 4) determining, for each of said lexical items, a total number of contextual relations found in step
3),5) determining a number of contextual relations which said lexical items have in common, and 6) determining, on the basis of the results obtained in steps
4) and
5), a degree of overlap between the contextual relations of said lexical items, and thereby a degree of semantic proximity, or a degree of similarity, between said lexical items;f) for each combination of a contextual relation identified in said original sentence in step b), and a contextual relation retrieved from said bilingual lexicon in step c) together with an associated translation, adding the result obtained in step e) to obtain a score representing the appropriateness of that translation; and g) selecting from said set of alternative translations retrieved in step c) that translation to which the highest score is attached at the conclusion of step f).
-
-
2. A method of selecting automatically the most appropriate translation, in a given target language, of a given lexical item in a given context in a given source language, comprising the following steps:
-
a) parsing, with the aid of a parsing system, an original sentence in which said given lexical item appears, in order to determine a syntactic structure of said sentence; b) identifying, in said syntactic structure, those contextual relations which said given lexical item has in said sentence, including identifying other lexical items in said sentence to which the given lexical item is syntactically related, directly or indirectly via another lexical item, together with the syntactic relations involved; c) retrieving the given lexical item, together with a set of alternative translations thereof, from a bilingual lexicon stored in electronic form; d) retrieving from said bilingual lexicon each of the other syntactically related lexical items identified in step b), together with a set of alternative translations thereof; e) identifying in a predefined text corpus in said target language a set of sentences containing at least one of said alternative translations retrieved in steps c) and d), and retrieving said set of sentences from said text corpus; f) parsing, with the aid of said parsing system, each of said sentences retrieved in step e), in order to determine a syntactic structure for each of said sentences, g) for each sentence parsed in step f), determining from the determined syntactic structure those contextual relations which said alternative translations have in that sentence; h) determining, for each of said alternative translations, a total number of contextual relations found in step g); i) for each combination of one of said alternative translations of said given lexical item, retrieved in step c), with one of said alternative translations of the other lexical items, retrieved in step d), determining a degree of semantic association by means of the following procedure; 1) identifying in said set of sentences retrieved in step e) a subset of sentences which contain said combination, and in which members of said combination are syntactically related to each other directly or indirectly via another lexical item, 2) determining a total number of sentences in said subset identified in step
1), and3) determining, on the basis of the results obtained in step h), a statistical significance of the result obtained in step
2), and thereby determining the degree of semantic association between the members of said combination,j) for each combination defined in step i), adding the result obtained in step i) to a score representing the appropriateness of that translation of said given lexical item; and k) selecting from said set of alternative translations retrieved in step c) that translation to which the highest score is attached at the conclusion of step j).
-
Specification