Method and apparatus for providing syntactic analysis and data structure for translation knowledge in example-based language translation

US 6,243,669 B1
Filed: 01/29/1999
Issued: 06/05/2001
Est. Priority Date: 01/29/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for performing spoken language translation, comprising:

receiving at least one input;

performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;

performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;

determining at least one linguistic constituent of the at least one input;

determining a pragmatic type and a syntactic type of the at least one linguistic constituent;

retaining an order of the at least one linguistic constituent in the at least one input; and

providing at least one output comprising an identification of the at least one input.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Syntactic analysis is performed on an input and on entries of a bilingual example database using at least one parse tree. The parse trees, formed using a context-free grammar, comprise a number of nodes and each node comprises at least one production rule. Furthermore, at least one node comprises nested production rules. The nested production rules comprise production rules for different combinations of the linguistic constituents of the input. The syntactic analysis comprises recognizing linguistic constituents, ordering the linguistic constituents, representing the linguistic constituents using an adapted feature structure analysis representation, and manipulating the adapted feature structure analysis representation using a natural language parser. The syntactic analysis further comprises generalizing surface variations in the input and the entries of the example database in order to increase the translation efficiency. Linguistic constituents of the input are determined, and a pragmatic type and a syntactic type of the linguistic constituents are determined. The order of the linguistic constituents in the input is retained. An output is provided comprising an identification of the input.

260 Citations

52 Claims

1. A method for performing spoken language translation, comprising:
- receiving at least one input;
  
  performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
  
  performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
  
  determining at least one linguistic constituent of the at least one input;
  
  determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
  
  retaining an order of the at least one linguistic constituent in the at least one input; and
  
  providing at least one output comprising an identification of the at least one input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein each level of the at least one level of nested production rules comprises a production rule for a combination of the at least one linguistic constituent of the at least one input.
  - 3. The method of claim 1, further comprising:
4. The method of claim 3, wherein generating at least one target language expression comprises accessing and using at least one target language generation grammar set.
5. The method of claim 1, wherein performing syntactic analysis further comprises generalizing at least one surface variation in the at least one input and the at least one example database, wherein efficiency of the spoken language translation is increased.
6. The method of claim 1, further comprising:
- determining at least one syntactic constituent of the at least one input; and
  
  combining entries of the example database based on the at least one syntactic constituent.
7. The method of claim 1, wherein the example database is a multilingual example database, and wherein the expression pair is a multilingual expression group.
8. The method of claim 1, wherein performing syntactic analysis further comprises:
- recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
  
  ordering the linguistic constituents;
  
  representing the linguistic constituents using an adapted feature structure analysis representation; and
  
  manipulating the adapted feature structure analysis representation using at least one natural language parser.
9. The method of claim 1, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
10. The method of claim 1, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
11. The method of claim 1, further comprising performing statistical processing to resolve lexical ambiguities and local ambiguities.
12. The method of claim 1, wherein performing syntactic analysis further comprises:
- accessing and using at least one source language dictionary; and
  
  accessing and using at least one source language shallow syntactic grammar set.
13. The method of claim 1, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
14. The method of claim 1, wherein the at least one input comprises spoken language.

15. An apparatus for spoken language translation comprising:
- at least one processor;
  
  an input coupled to the at least one processor, the input capable of receiving speech signals, the at least one processor configured to identify constituents of the received speech signals by, performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
  
  performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
  
  determining at least one linguistic constituent of the at least one input;
  
  determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
  
  retaining an order of the at least one linguistic constituent in the at least one input; and
  
  an output coupled to the at least one processor, the output capable of providing an identification of the at least one input.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The apparatus of claim 15, wherein each level of the at least one level of nested production rules comprises a production rule for a combination of the at least one linguistic constituent of the at least one input.
  - 17. The apparatus of claim 15, wherein the at least one processor is further configured to identify by:
18. The apparatus of claim 17, wherein generating at least one target language expression comprises accessing and using at least one target language generation grammar set.
19. The apparatus of claim 15, wherein the syntactic analysis comprises generalizing at least one surface variation in the at least one input and the at least one example database, wherein efficiency of the spoken language translation is increased.
20. The apparatus of claim 15, wherein the at least one processor is further configured to identify by:
- determining at least one syntactic constituent of the at least one input; and
  
  combining entries of the example database based on the at least one syntactic constituent.
21. The apparatus of claim 15, wherein the example database is a bilingual example database, and wherein the expression pair is a bilingual expression pair.
22. The apparatus of claim 15, wherein the syntactic analysis comprises:
- recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
  
  ordering the linguistic constituents;
  
  representing the linguistic constituents using an adapted feature structure analysis representation; and
  
  manipulating the adapted feature structure analysis representation using at least one natural language parser.
23. The apparatus of claim 15, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
24. The apparatus of claim 15, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
25. The apparatus of claim 15, wherein the at least one processor is further configured to identify by performing statistical processing to resolve lexical ambiguities and local ambiguities.
26. The apparatus of claim 15, wherein the syntactic analysis further comprises:
- accessing and using at least one source language dictionary; and
  
  accessing and using at least one source language shallow syntactic grammar set.
27. The apparatus of claim 15, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
28. The apparatus of claim 15, wherein the at least one input comprises spoken language.

29. A computer readable medium containing executable instructions which, when executed in a processing system, causes the system to perform a method for spoken language translation, the method comprising:
- receiving at least one input;
  
  performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
  
  performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
  
  determining at least one linguistic constituent of the at least one input;
  
  determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
  
  retaining an order of the at least one linguistic constituent in the at least one input; and
  
  providing at least one output comprising an identification of the at least one input.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
- - 30. The computer readable medium of claim 29, wherein each level of the at least one level of nested production rules comprises a production rule for a combination of the at least one linguistic constituent of the at least one input.
  - 31. The computer readable medium of claim 29, wherein the method further comprises:
32. The computer readable medium of claim 31, wherein generating at least one target language expression comprises accessing and using at least one target language generation grammar set.
33. The computer readable medium of claim 29, wherein performing syntactic analysis further comprises generalizing at least one surface variation in the at least one input and the at least one example database, wherein efficiency of the spoken language translation is increased.
34. The computer readable medium of claim 29, wherein the method further comprises:
- determining at least one syntactic constituent of the at least one input; and
  
  combining entries of the example database based on the at least one syntactic constituent.
35. The computer readable medium of claim 29, wherein the example database is a bilingual example database, and wherein the expression pair is a bilingual expression pair.
36. The computer readable medium of claim 29, wherein performing syntactic analysis further comprises:
- recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
  
  ordering the linguistic constituents;
  
  representing the linguistic constituents using an adapted feature structure analysis representation; and
  
  manipulating the adapted feature structure analysis representation using at least one natural language parser.
37. The computer readable medium of claim 29, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
38. The computer readable medium of claim 29, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
39. The computer readable medium of claim 29, wherein the method further comprises performing statistical processing to resolve lexical ambiguities and local ambiguities.
40. The computer readable medium of claim 29, wherein performing syntactic analysis further comprises:
- accessing and using at least one source language dictionary; and
  
  accessing and using at least one source language shallow syntactic grammar set.
41. The computer readable medium of claim 29, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.
42. The computer readable medium of claim 29, wherein the at least one input comprises spoken language.

43. A spoken language translation system, comprising:
- a means for receiving at least one input;
  
  a means for performing syntactic analysis on the at least one input using at least one parse tree comprising a plurality of nodes, each node comprising at least one production rule, wherein at least one node of the plurality of nodes comprises at least one level of nested production rules;
  
  a means for performing syntactic analysis on at least one entry from at least one example database using the at least one parse tree;
  
  a means for determining at least one linguistic constituent of the at least one input;
  
  a means for determining a pragmatic type and a syntactic type of the at least one linguistic constituent;
  
  a means for retaining an order of the at least one linguistic constituent in the at least one input; and
  
  a means for providing at least one output comprising an identification of the at least one input.
- View Dependent Claims (44, 45, 46, 47, 48, 49, 50, 51, 52)
- - 44. The system of claim 43, wherein each level of the at least one level of nested production rules comprises a production rule for a combination of the at least one linguistic constituent of the at least one input.
  - 45. The system of claim 43, further comprising:
46. The system of claim 43, wherein the means for performing syntactic analysis further comprises a means for generalizing at least one surface variation in the at least one input and the at least one example database.
47. The system of claim 43, further comprising:
- a means for determining at least one syntactic constituent of the at least one input; and
  
  a means for combining entries of the example database based on the at least one syntactic constituent.
48. The system of claim 43, wherein the means for performing syntactic analysis further comprises:
- a means for recognizing linguistic constituents selected from a group comprising noun phrases, verb phrases, and prepositional phrases;
  
  a means for ordering the linguistic constituents;
  
  a means for representing the linguistic constituents using an adapted feature structure analysis representation; and
  
  a means for manipulating the adapted feature structure analysis representation using at least one natural language parser.
49. The system of claim 43, wherein a separation is provided between domain-independent linguistic knowledge and domain-dependent linguistic knowledge.
50. The system of claim 43, wherein the at least one example database comprises entries having an adapted feature structure representation comprising at least one sub-feature structure for corresponding source language expressions and target language expressions, wherein correspondence between constituents in the source language expression and the target language expression is indicated by indexes.
51. The system of claim 43, further comprising a means for performing statistical processing to resolve lexical ambiguities and local ambiguities.
52. The system of claim 43, wherein the at least one parse tree is a context-free parse tree, wherein the context-free parse tree is formed using a context-free grammar, wherein the method further comprises the step of mapping the context-free parse tree into at least one feature structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Franz, Alexander M., Horiguchi, Keiko
Primary Examiner(s)
Thomas, Joseph

Application Number

US09/239,641
Time in Patent Office

858 Days
Field of Search

704/9, 704/10, 704/1, 704/251, 704/252, 704/255, 704/257, 704/277, 704/270, 704/275, 704/3, 704/7, 704/2, 707/530, 707/531, 707/532, 707/533, 707/536
US Class Current

704/9
CPC Class Codes

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/216   using statistical methods

G06F 40/268   Morphological analysis

G06F 40/45   Example-based machine trans...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/26   Speech to text systems G10L...

Method and apparatus for providing syntactic analysis and data structure for translation knowledge in example-based language translation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

260 Citations

52 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for providing syntactic analysis and data structure for translation knowledge in example-based language translation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

260 Citations

52 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links