Parsing and translating natural language sentences automatically
First Claim
1. A method of automatically parsing a sentence of a natural language, comprising:
- defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
building a corpus of sentence dependency structures by annotating with the corresponding coded tags the words in each sentence of a set of sentences having respective exemplary dependency structures to obtain a set of sequences of coded tags representing the dependency structures;
determining the statistical parameters of a corpus-based statistical tool from the corpus of sentence dependency structures;
tagging the words of an input sentence to be parsed with the corresponding coded tags to obtain a sequence of coded tags representing the input sentence;
applying the corpus-based statistical tool to the sequence of coded tags to derive therefrom the most probable dependency structure for the input sentence; and
generating a dependency parse tree of the sentence from the derived dependency structure and the words of the input sentence.
1 Assignment
0 Petitions
Accused Products
Abstract
A sentence of a natural language is automatically parsed by: defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language; building a small corpus of sentence dependency structures by annotating with the corresponding coded tags the words in each sentence of a set of sentences having respective exemplary dependency structures to obtain a set of sequences of coded tags representing the dependency structures; determining the statistical parameters of a corpus-based statistical tool, such as a hidden Markov model, from the corpus of sentence dependency structures; tagging the words of an input sentence to be parsed with the corresponding coded tags to obtain a sequence of coded tags to derive therefrom the most probable dependency structure for the input sentence; and generating a dependency parse tree of the sentence from the derived dependency structure and the words of the input sentence. The invention finds particular application in the automatic translation of a first natural language into a second language.
-
Citations
52 Claims
-
1. A method of automatically parsing a sentence of a natural language, comprising:
- defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
building a corpus of sentence dependency structures by annotating with the corresponding coded tags the words in each sentence of a set of sentences having respective exemplary dependency structures to obtain a set of sequences of coded tags representing the dependency structures;
determining the statistical parameters of a corpus-based statistical tool from the corpus of sentence dependency structures;
tagging the words of an input sentence to be parsed with the corresponding coded tags to obtain a sequence of coded tags representing the input sentence;
applying the corpus-based statistical tool to the sequence of coded tags to derive therefrom the most probable dependency structure for the input sentence; and
generating a dependency parse tree of the sentence from the derived dependency structure and the words of the input sentence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
-
14. A machine for parsing a sentence of a natural language, comprising:
- a coder defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
a corpus-based statistical tool having statistical parameters determined from a corpus of sentence dependency structures formed by annotating with the corresponding coded tags the words in each sentence of a set of sentences having respective exemplary dependency structures;
a tagger for tagging the words of a sentence to be parsed with the corresponding coded tags to obtain a sequence of coded tags representing the input sentence;
a processor for applying the corpus-based statistical tool to the sequence of coded tags to derive therefrom the most probable dependency structure for the input sentence and to generate a dependency parse tree of the sentence from the derived dependency structure and the words of the input sentence. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- a coder defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
-
27. A method of automatically translating a first natural language into a second language, comprising the steps of:
-
converting a spoken or written text in the first language into electrical signals representing the spoken or written words; rendering the text into sentences; processing each sentence by an automatic parsing method to generate a dependency parse tree for the sentence by defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
building a corpus of sentence dependency structures by annotating with the corresponding coded tags the words in each sentence of a set of sentences having respective exemplary dependency structures to obtain a set of sequences of coded tags representing the dependency structures;
determining the statistical parameters of a corpus-based statistical tool from the corpus of sentence dependency structures;
tagging the words of an input sentence to be parsed with the corresponding coded tags to obtain a sequence of coded tags representing the input sequence;
applying the corpus-based statistical tool to the sequence of coded tags to derive therefrom the most probable dependency structure for the input sentence; and
generating a dependency parse tree of the sentence from the derived dependency structure and the words of the input sentence;synthesizing a sentence in a second language based on the words and the derived dependency tree of the sentence in the first language; and audibly or visually reproducing a synthesized sentence in the second language. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. An apparatus for automatically translating a first natural language into a second language, comprising:
-
a converter for converting a spoken or written text in the first language into electrical signals representing the spoken or written words; a means for rendering the text into sentences; an automatic parsing machine for processing each sentence to generate a dependency parse tree for the sentence comprising a coder defining a set of coded tags each corresponding to a respective part-of-speech of a word in a sentence of the language;
a corpus-based statistical tool having statistical parameters determined from a corpus of sentence dependency structures formed by annotating with the corresponding coded tags the words in each sentence of a set of sentences having respective exemplary dependency structures;
a tagger for tagging the words of a sentence to be parsed with the corresponding coded tags to obtain a sequence of coded tags representing the input sentence;
a processor for applying the corpus-based statistical tool to the sequence of coded tags to derive therefrom the most probable dependency structure for the input sentence and to generate a dependency parse tree of the sentence from the derived dependency structure and the words of the input sentence;a synthesizer for synthesizing a sentence in the second language based on the words and the derived dependency tree of the sentence in the first language; and a reproduction device for audibly or visually reproducing the synthesized sentence in the second language. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
Specification