Machine translation method and system that decomposes complex sentences into two or more sentences
First Claim
Patent Images
1. A machine translation system for translating a complex sentence in a first language into two or more sentences in a second language, wherein said complex sentence comprises one or more adjectival clauses, coordination clauses, or appositions, the system comprising:
- a memory; and
a processor implementing;
a grammar analysis module for identifying main clause, embedded clauses, phrases and all possible cohesive ties linking said embedded clauses to said main clause, said grammar analysis module being associated with a database of complex sentences in said first language;
a cohesive tie stripping module for identifying and stripping down said possible cohesive ties;
a punctuation module for applying a weighted punctuation for segmentation to said complex sentence to decompose said complex sentence into simple sentences; and
a translation module for translating said decomposed simple sentences and phrases into said second language;
wherein said cohesive ties are stripped down before applying said weighted punctuation for segmentation;
wherein if one or more words were omitted in said complex sentence by ellipsis, said grammar analysis module is invoked to add said omitted words back to said complex sentence to make said complex sentence grammatically complete before said decomposition of said complex sentence into said simple sentences;
wherein said punctuation module is invoked to change one or more comma to a period;
wherein said punctuation module is invoked to supply one or more comma or periods;
wherein said punctuation module applies a more weighted punctuation to a nonrestrictive clause than to a restrictive clause;
wherein said translation module is invoked after said decomposition of said complex sentence into said simple sentences; and
wherein said possible cohesive ties comprises any of;
time and place relaters, logical connectors, substitution, disclosure reference, comparison, ellipsis and structure parallelism.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention discloses a technology for decomposing prose elements in document processing. Grammar analysis of complex sentences can identify main clause, embedded clauses, phrases and cohesive ties that link the embedded clauses to the main clause. Cohesive ties are stripped down and a weighted punctuation for segmentation is applied to decompose complex sentences into simple sentences.
-
Citations
4 Claims
-
1. A machine translation system for translating a complex sentence in a first language into two or more sentences in a second language, wherein said complex sentence comprises one or more adjectival clauses, coordination clauses, or appositions, the system comprising:
-
a memory; and a processor implementing; a grammar analysis module for identifying main clause, embedded clauses, phrases and all possible cohesive ties linking said embedded clauses to said main clause, said grammar analysis module being associated with a database of complex sentences in said first language; a cohesive tie stripping module for identifying and stripping down said possible cohesive ties; a punctuation module for applying a weighted punctuation for segmentation to said complex sentence to decompose said complex sentence into simple sentences; and a translation module for translating said decomposed simple sentences and phrases into said second language; wherein said cohesive ties are stripped down before applying said weighted punctuation for segmentation; wherein if one or more words were omitted in said complex sentence by ellipsis, said grammar analysis module is invoked to add said omitted words back to said complex sentence to make said complex sentence grammatically complete before said decomposition of said complex sentence into said simple sentences; wherein said punctuation module is invoked to change one or more comma to a period; wherein said punctuation module is invoked to supply one or more comma or periods; wherein said punctuation module applies a more weighted punctuation to a nonrestrictive clause than to a restrictive clause; wherein said translation module is invoked after said decomposition of said complex sentence into said simple sentences; and wherein said possible cohesive ties comprises any of;
time and place relaters, logical connectors, substitution, disclosure reference, comparison, ellipsis and structure parallelism. - View Dependent Claims (2)
-
-
3. A method for machine translation of a complex sentence in a first language into two or more sentences in a second language, comprising the steps of:
-
performing, using a processor, grammar analysis on a complex sentence to identify main clause, embedded clauses, phrases and all possible cohesive ties linking said embedded clauses to said main clause; identifying and stripping down said possible cohesive ties from said complex sentence; decomposing said complex sentence into simple sentences and phrases by applying a weighted punctuation to said complex sentence; and translating said simple sentences and phrases into a second language; wherein said cohesive ties are stripped down before applying said weighted punctuation to said complex sentence; wherein if one or more words were omitted in said complex sentence by ellipsis, said grammar analysis is performed to add said omitted words back to said complex sentence to make said complex sentence grammatically complete before said decomposing said complex sentence into said simple sentences; wherein said decomposing of said complex sentence changes one or more comma to a period; wherein said decomposing of said complex sentence supplies one or more comma or periods; wherein a more weighted punctuation is applied to a nonrestrictive clause than to a restrictive clause; wherein said translating to said second language is performed after said decomposing of said complex sentence into said simple sentences, and wherein said possible cohesive ties comprise any of;
time and place relaters, logical connectors, substitution, disclosure reference, comparison, ellipsis and structure parallelism, in the forms of noun phrases, verb phrases, adverbial phrases, prepositional phrases, and adjunct-head. - View Dependent Claims (4)
-
Specification