Using multilingual lexical resources to improve lexical simplification
First Claim
Patent Images
1. An information handling system comprising:
- one or more processors;
a memory coupled to at least one of the processors; and
a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions comprising;
creating a multi-language word mapping by a multi-language word mapping generator executing on the information handling system, wherein the creating further comprises;
retrieving a word that belongs to a first natural language;
retrieving a first set of complexity data pertaining to the word in the first natural language, wherein the first set of complexity data comprises a first word length and a first word frequency;
translating the word to one or more translated words, wherein each of the translated words corresponds to one or more second natural languages;
retrieving one or more second sets of complexity data, wherein each of the second sets of complexity data correspond to a different one of the translated words, wherein the one or more second sets of complexity data comprises one or more second word lengths and one or more second word frequencies; and
computing a complexity of the word in the first natural language based on an overall word length and an overall word frequency, wherein the overall word length is based on the first word length and the one or more second word lengths, and wherein the overall word frequency is based on the first word frequency and the one or more second word frequencies; and
storing the computed complexity of the word in the multi-language word mapping; and
performing, by the information handling system, lexical simplification on the document that comprises replacing the word in a document with one of the one or more translated words based on the computed complexity of the word stored in the multi-language word mapping.
1 Assignment
0 Petitions
Accused Products
Abstract
An approach is provided that receives a word that belongs to a first natural language and retrieves a first set of complexity data pertaining to the word in the first natural language. The approach translates the word to one or more translated words, with each of the translated words corresponding to one or more second natural languages. The approach then retrieves sets of complexity data, with the sets of complexity data corresponding to a different translated word. The approach determines a complexity of the word in the first natural language based on an analysis of the first and second sets of complexity data.
32 Citations
9 Claims
-
1. An information handling system comprising:
-
one or more processors; a memory coupled to at least one of the processors; and a set of computer program instructions stored in the memory and executed by at least one of the processors in order to perform actions comprising; creating a multi-language word mapping by a multi-language word mapping generator executing on the information handling system, wherein the creating further comprises; retrieving a word that belongs to a first natural language; retrieving a first set of complexity data pertaining to the word in the first natural language, wherein the first set of complexity data comprises a first word length and a first word frequency; translating the word to one or more translated words, wherein each of the translated words corresponds to one or more second natural languages; retrieving one or more second sets of complexity data, wherein each of the second sets of complexity data correspond to a different one of the translated words, wherein the one or more second sets of complexity data comprises one or more second word lengths and one or more second word frequencies; and computing a complexity of the word in the first natural language based on an overall word length and an overall word frequency, wherein the overall word length is based on the first word length and the one or more second word lengths, and wherein the overall word frequency is based on the first word frequency and the one or more second word frequencies; and storing the computed complexity of the word in the multi-language word mapping; and performing, by the information handling system, lexical simplification on the document that comprises replacing the word in a document with one of the one or more translated words based on the computed complexity of the word stored in the multi-language word mapping. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product stored in a non-transitory computer readable storage medium, comprising computer program code that, when executed by an information handling system, performs actions comprising:
-
creating a multi-language word mapping by a multi-language word mapping generator executing on the information handling system, wherein the creating further comprises; retrieving a word that belongs to a first natural language; retrieving a first set of complexity data pertaining to the word in the first natural language, wherein the first set of complexity data comprises a first word length and a first word frequency; translating the word to one or more translated words, wherein each of the translated words corresponds to one or more second natural languages; retrieving one or more second sets of complexity data, wherein each of the second sets of complexity data correspond to a different one of the translated words, wherein the one or more second sets of complexity data comprises one or more second word lengths and one or more second word frequencies; and computing a complexity of the word in the first natural language based on an overall word length and an overall word frequency, wherein the overall word length is based on the first word length and the one or more second word lengths, and wherein the overall word frequency is based on the first word frequency and the one or more second word frequencies; and storing the computed complexity of the word in the multi-language word mapping; and performing, by the information handling system, lexical simplification on the document that comprises replacing the word in a document with one of the one or more translated words based on the computed complexity of the word stored in the multi-language word mapping. - View Dependent Claims (7, 8, 9)
-
Specification