Method and apparatus for providing syntactic analysis and data structure for translation knowledge in example-based language translation
First Claim
1. A method for performing spoken language translation, comprising:
- receiving at least one input;
performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
determining at least one linguistic constituent of the at least one input;
determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
retaining an order of the at least one linguistic constituent in the at least one input; and
providing at least one output comprising an identification of the at least one input.
1 Assignment
0 Petitions
Accused Products
Abstract
Syntactic analysis is performed on an input and on entries of a bilingual example database using at least one parse tree. The parse trees, formed using a context-free grammar, comprise a number of nodes and each node comprises at least one production rule. Furthermore, at least one node comprises nested production rules. The nested production rules comprise production rules for different combinations of the linguistic constituents of the input. The syntactic analysis comprises recognizing linguistic constituents, ordering the linguistic constituents, representing the linguistic constituents using an adapted feature structure analysis representation, and manipulating the adapted feature structure analysis representation using a natural language parser. The syntactic analysis further comprises generalizing surface variations in the input and the entries of the example database in order to increase the translation efficiency. Linguistic constituents of the input are determined, and a pragmatic type and a syntactic type of the linguistic constituents are determined. The order of the linguistic constituents in the input is retained. An output is provided comprising an identification of the input.
260 Citations
52 Claims
-
1. A method for performing spoken language translation, comprising:
-
receiving at least one input;
performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
determining at least one linguistic constituent of the at least one input;
determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
retaining an order of the at least one linguistic constituent in the at least one input; and
providing at least one output comprising an identification of the at least one input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
receiving at least one speech input comprising at least one source language expression;
performing the syntactic analysis on the at least one source language expression and the at least one example database to recognize linguistic constituents;
searching the at least one example database to find an expression pair having a source language portion most similar to the at least one source language expression;
generating at least one target language expression using a target language portion of the expression pair; and
providing at least one speech output comprising the at least one target language expression.
-
-
4. The method of claim 3, wherein generating at least one target language expression comprises accessing and using at least one target language generation grammar set.
-
5. The method of claim 1, wherein performing syntactic analysis further comprises generalizing at least one surface variation in the at least one input and the at least one example database, wherein efficiency of the spoken language translation is increased.
-
6. The method of claim 1, further comprising:
-
determining at least one syntactic constituent of the at least one input; and
combining entries of the example database based on the at least one syntactic constituent.
-
-
7. The method of claim 1, wherein the example database is a multilingual example database, and wherein the expression pair is a multilingual expression group.
-
8. The method of claim 1, wherein performing syntactic analysis further comprises:
-
recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
ordering the linguistic constituents;
representing the linguistic constituents using an adapted feature structure analysis representation; and
manipulating the adapted feature structure analysis representation using at least one natural language parser.
-
-
9. The method of claim 1, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
-
10. The method of claim 1, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
-
11. The method of claim 1, further comprising performing statistical processing to resolve lexical ambiguities and local ambiguities.
-
12. The method of claim 1, wherein performing syntactic analysis further comprises:
-
accessing and using at least one source language dictionary; and
accessing and using at least one source language shallow syntactic grammar set.
-
-
13. The method of claim 1, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
-
14. The method of claim 1, wherein the at least one input comprises spoken language.
-
15. An apparatus for spoken language translation comprising:
-
at least one processor;
an input coupled to the at least one processor, the input capable of receiving speech signals, the at least one processor configured to identify constituents of the received speech signals by, performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
determining at least one linguistic constituent of the at least one input;
determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
retaining an order of the at least one linguistic constituent in the at least one input; and
an output coupled to the at least one processor, the output capable of providing an identification of the at least one input. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
receiving at least one speech input comprising at least one source language expression;
performing the syntactic analysis on the at least one source language expression and the at least one example database to recognize linguistic constituents;
searching the at least one example database to find an expression pair having a source language portion most similar to the at least one source language expression;
generating at least one target language expression using a target language portion of the expression pair; and
providing at least one speech output comprising the at least one target language expression.
-
-
18. The apparatus of claim 17, wherein generating at least one target language expression comprises accessing and using at least one target language generation grammar set.
-
19. The apparatus of claim 15, wherein the syntactic analysis comprises generalizing at least one surface variation in the at least one input and the at least one example database, wherein efficiency of the spoken language translation is increased.
-
20. The apparatus of claim 15, wherein the at least one processor is further configured to identify by:
-
determining at least one syntactic constituent of the at least one input; and
combining entries of the example database based on the at least one syntactic constituent.
-
-
21. The apparatus of claim 15, wherein the example database is a bilingual example database, and wherein the expression pair is a bilingual expression pair.
-
22. The apparatus of claim 15, wherein the syntactic analysis comprises:
-
recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
ordering the linguistic constituents;
representing the linguistic constituents using an adapted feature structure analysis representation; and
manipulating the adapted feature structure analysis representation using at least one natural language parser.
-
-
23. The apparatus of claim 15, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
-
24. The apparatus of claim 15, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
-
25. The apparatus of claim 15, wherein the at least one processor is further configured to identify by performing statistical processing to resolve lexical ambiguities and local ambiguities.
-
26. The apparatus of claim 15, wherein the syntactic analysis further comprises:
-
accessing and using at least one source language dictionary; and
accessing and using at least one source language shallow syntactic grammar set.
-
-
27. The apparatus of claim 15, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
-
28. The apparatus of claim 15, wherein the at least one input comprises spoken language.
-
29. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform a method for spoken language translation, the method comprising:
-
receiving at least one input;
performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
determining at least one linguistic constituent of the at least one input;
determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
retaining an order of the at least one linguistic constituent in the at least one input; and
providing at least one output comprising an identification of the at least one input. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
receiving at least one speech input comprising at least one source language expression;
performing the syntactic analysis on the at least one source language expression and the at least one example database to recognize linguistic constituents;
searching the at least one example database to find an expression pair having a source language portion most similar to the at least one source language expression;
generating at least one target language expression using a target language portion of the expression pair; and
providing at least one speech output comprising the at least one target language expression.
-
-
32. The computer readable medium of claim 31, wherein generating at least one target language expression comprises accessing and using at least one target language generation grammar set.
-
33. The computer readable medium of claim 29, wherein performing syntactic analysis further comprises generalizing at least one surface variation in the at least one input and the at least one example database, wherein efficiency of the spoken language translation is increased.
-
34. The computer readable medium of claim 29, wherein the method further comprises:
-
determining at least one syntactic constituent of the at least one input; and
combining entries of the example database based on the at least one syntactic constituent.
-
-
35. The computer readable medium of claim 29, wherein the example database is a bilingual example database, and wherein the expression pair is a bilingual expression pair.
-
36. The computer readable medium of claim 29, wherein performing syntactic analysis further comprises:
-
recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
ordering the linguistic constituents;
representing the linguistic constituents using an adapted feature structure analysis representation; and
manipulating the adapted feature structure analysis representation using at least one natural language parser.
-
-
37. The computer readable medium of claim 29, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
-
38. The computer readable medium of claim 29, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
-
39. The computer readable medium of claim 29, wherein the method further comprises performing statistical processing to resolve lexical ambiguities and local ambiguities.
-
40. The computer readable medium of claim 29, wherein performing syntactic analysis further comprises:
-
accessing and using at least one source language dictionary; and
accessing and using at least one source language shallow syntactic grammar set.
-
-
41. The computer readable medium of claim 29, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
-
42. The computer readable medium of claim 29, wherein the at least one input comprises spoken language.
-
43. A spoken language translation system, comprising:
-
a means for receiving at least one input;
a means for performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
a means for performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
a means for determining at least one linguistic constituent of the at least one input;
a means for determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
a means for retaining an order of the at least one linguistic constituent in the at least one input; and
a means for providing at least one output comprising an identification of the at least one input. - View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52)
a means for receiving at least one speech input comprising at least one source language expression;
a means for performing the syntactic analysis on the at least one source language expression and the at least one example database to recognize linguistic constituents;
a means for searching the at least one example database to find an expression pair having a source language portion most similar to the at least one source language expression;
a means for generating at least one target language expression using a target language portion of the expression pair; and
a means for providing at least one speech output comprising the at least one target language expression.
-
-
46. The system of claim 43, wherein the means for performing syntactic analysis further comprises a means for generalizing at least one surface variation in the at least one input and the at least one example database.
-
47. The system of claim 43, further comprising:
-
a means for determining at least one syntactic constituent of the at least one input; and
a means for combining entries of the example database based on the at least one syntactic constituent.
-
-
48. The system of claim 43, wherein the means for performing syntactic analysis further comprises:
-
a means for recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
a means for ordering the linguistic constituents;
a means for representing the linguistic constituents using an adapted feature structure analysis representation; and
a means for manipulating the adapted feature structure analysis representation using at least one natural language parser.
-
-
49. The system of claim 43, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
-
50. The system of claim 43, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
-
51. The system of claim 43, further comprising a means for performing statistical processing to resolve lexical ambiguities and local ambiguities.
-
52. The system of claim 43, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
Specification