Method and apparatus for deriving information from written text
First Claim
1. A natural language information extraction system for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said system comprising:
- an input for receiving data elements indicative of the textual representation of the sentence;
a processing unit coupled to said input, said processing unit being operative for processing the textual representation of the sentence to derive;
a parse tree group including a plurality of parse trees, each parse tree in said parse tree group including a word of the sentence, at least one parse tree including at least two words of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the at least two words of the sentence, and;
at least one noun phrase associated to a semantic type;
said processing unit being operative for processing said parse tree group and said at least one noun phrase associated to a semantic type on the basis of a set of information extraction rules to derive an information record, the information record being indicative of a semantic representation of at least part of the sentence;
an output for releasing one or more data elements indicative of the information record.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for deriving information from a textual representation of a sentence are provided, the sentence having a plurality of words. The apparatus comprises an input, a processing unit and an output. The input is for receiving data elements indicative of the textual representation of the sentence. The processing unit is operative for processing the textual representation of the sentence to derive an information record on the basis of a set of information extraction rules, the information record being indicative of a semantic representation of at least part of the sentence. The information record is then released at the output. A computer readable medium comprising a program element suitable for execution by a computing apparatus for deriving information from a textual representation of a sentence is also provided.
106 Citations
34 Claims
-
1. A natural language information extraction system for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said system comprising:
-
an input for receiving data elements indicative of the textual representation of the sentence; a processing unit coupled to said input, said processing unit being operative for processing the textual representation of the sentence to derive; a parse tree group including a plurality of parse trees, each parse tree in said parse tree group including a word of the sentence, at least one parse tree including at least two words of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the at least two words of the sentence, and; at least one noun phrase associated to a semantic type; said processing unit being operative for processing said parse tree group and said at least one noun phrase associated to a semantic type on the basis of a set of information extraction rules to derive an information record, the information record being indicative of a semantic representation of at least part of the sentence; an output for releasing one or more data elements indicative of the information record. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said method comprising:
-
receiving data elements indicative of the textual representation of the sentence; processing the textual representation of the sentence to derive; a parse tree group including a plurality of parse trees, each parse tree in said parse tree group including a word of the sentence, at least one parse tree including at least two words of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the at least two words of the sentence, and; at least one noun phrase associated to a semantic type; processing the parse tree group on the basis of a set of information extraction rules and the at least one noun phrase associated to a semantic type to derive an information record, the information record being indicative of a semantic representation of at least part of the sentence. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer readable medium comprising a program element suitable for execution by a computing apparatus for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said computing apparatus comprising:
a processor, said program element when executing on said processor being operative for; receiving data elements indicative of the textual representation of the sentence; processing the textual representation of the sentence to derive; a parse tree group including a plurality of parse trees, each parse tree in said parse tree group including a word of the sentence, at least one parse tree including at least two words of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the at least two words of the sentence, and; at least one noun phrase associated to a semantic type; processing the parse tree group and the at least one noun phrase associated to a semantic type on the basis of a set of information extraction rules to derive an information record, the information record being indicative of a semantic representation of at least part of the sentence; releasing one or more data elements indicative of the information record. - View Dependent Claims (12, 13, 14, 15)
-
16. An apparatus for parsing a textual representation of a sentence to derive a parse tree group including a plurality of parse trees, the sentence including a plurality of words, the apparatus comprising:
-
an input for receiving data elements indicative of the textual representation of the sentence; a processing unit for processing the data elements indicative of the sentence to generate a parse tree group, said processing unit being operative for; generating a parse tree for each word in the sentence and adding each generated parse tree to the parse tree group, wherein each parse tree in the parse tree group is formed of at least one node, and wherein all of the nodes that form the parse tree are associated to a word in the sentence; generating a new parse tree on the basis of binary dependency rules applied to a given parse tree in the parse tree group, the new parse tree resulting from a combination of the given parse tree and another parse tree from the parse tree group; adding the new parse tree to the parse tree group; wherein at least one parse tree in the parse tree group includes at least two nodes, each node of said at least two nodes being associated to a respective word of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the words associated to said at least two nodes; an output for releasing a signal indicative of the parse tree group, said parse tree group being in a format suitable to be processed to derive a semantic representation of at least part of the sentence at least in part on the basis of the parse tree group. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A method for parsing a textual representation of a sentence to derive a parse tree group including a plurality of parse trees, the sentence including a plurality of words, said method comprising:
-
receiving data elements indicative of the sentence; processing the data elements indicative of the sentence to generate a parse tree group by; generating a parse tree for each word in the sentence and adding each generated parse tree to the parse tree group, wherein each parse tree in the parse tree group is formed of at least one node, and wherein all of the nodes that form the parse tree are associated to a word in the sentence, wherein at least one parse tree in the parse tree group includes at least two nodes, each node of said at least two nodes being associated to a respective word of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the words associated to said at least two nodes; generating a new parse tree on the basis of binary dependency rules applied to a given parse tree in the parse tree group, the new parse tree resulting from a combination of the given parse tree and another parse tree from the parse tree group; adding the new parse tree to the parse tree group, wherein the parse tree group is suitable to be processed to derive a semantic representation of at least part of the sentence at least in part on the basis of the parse tree group.
-
-
22. A computer readable medium comprising a program element suitable for execution by a computing apparatus for parsing a textual representation of a sentence to derive a parse tree group including a plurality of parse trees, the sentence including a plurality of words, said computing apparatus comprising:
a processor, said program element when executing on said processor being operative for; receiving data elements indicative of the sentence; generating a parse tree for each word in the sentence and adding each generated parse tree to a parse tree group, wherein each parse tree in the parse tree group is formed of at least one node, and wherein all of the nodes that form the parse tree are associated to a word in the sentence, wherein at least one parse tree in the parse tree group includes at least two nodes, each node of said at least two nodes being associated to a respective word of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the words associated to said at least two nodes; generating a new parse tree on the basis of binary dependency rules applied to a given parse tree in the parse tree group, the new parse tree resulting from a combination of the given parse tree and another parse tree from the parse tree group; adding the new parse tree to the parse tree group; releasing a signal indicative of the parse tree group in a format suitable to be processed to derive a semantic representation of at least part of the sentence at least in part on the basis of the parse tree group.
-
23. A natural language information extraction system for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said system comprising:
-
means for receiving data elements indicative of the textual representation of the sentence; means for processing the textual representation of the sentence to derive; a parse tree group including a plurality of parse trees, each parse tree in said parse tree group including a word of the sentence, at least one parse tree including at least two words of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the at least two words of the sentence, and; at least one noun phrase associated to a semantic type; said means for processing being operative for processing said parse tree group and said at least one noun phrase on the basis of a set of information extraction rules to derive an information record, the information record being indicative of a semantic representation of at least part of the sentence; means for releasing the information record.
-
-
24. A natural language information extraction system for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said system comprising:
-
an input for receiving data elements indicative of the textual representation of the sentence; a processing unit coupled to said input, said processing unit being operative for processing the textual representation of the sentence to derive; a parse tree group including a plurality of parse trees, wherein each parse tree in the parse tree group is formed from at least one node, all of the nodes forming the parse tree being associated to a word in the sentence, at least one parse tree including at least two words of the sentence, said at least one parse tree including a dependency data element describing a syntactic relationship between the at least two words of the sentence;
said processing unit being operative for processing said parse tree group on the basis of a set of information extraction rules to derive an information record, the information record being indicative of a semantic representation of at least part of the sentence;an output for releasing data elements indicative of the information record. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A natural language information extraction system for deriving information from a textual representation of a sentence, the sentence having a plurality of words, said system comprising:
-
an input for receiving data elements indicative of the textual representation of the sentence; a processing unit coupled to said input, said processing unit being operative for; generating a parse tree group including a plurality of parse trees, each parse tree in said parse tree group including a word of the sentence, at least some parse trees including at least two words of the sentence and a data element indicative of the syntactic dependencies between the at least two words; generating on the basis of the parse tree group a plurality of lexical frames, each lexical frame being associated to a respective word in the sentence, a certain lexical frame being associated to a certain word in the sentence and comprising a list of words of the sentence other than the certain word, each word in the list of words being associated to a dependency data element indicative of the syntactic relationship of each word in the list of words with the certain word; processing said plurality of lexical frames on the basis of a set of information extraction rules to derive an information record being indicative of a semantic representation of at least part of the sentence; an output for releasing data elements indicative of the information record.
-
Specification