Method and apparatus for extracting entity names and their relations
First Claim
Patent Images
1. A method comprising:
- generating a number of Information-Gain (IG)-Trees based on a memory learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document;
extracting entity names and relations between entity names based on the IG-Trees;
receiving annotated data;
parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and
extracting the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus.
0 Assignments
0 Petitions
Accused Products
Abstract
According to one embodiment of the invention, a method includes generating a person-name Information Gain (IG)-Tree and a relation IG-Tree from annotated data. The method also includes tagging and partial parsing of an input document. The names of the persons are extracted within the input document using the person-name IG-tree. Additionally, names of organizations are extracted within the input document. The method also includes extracting entity names that are not names of persons and organizations within the input document. Further, the relations between the identified entity names are extracted using the relation-IG-tree.
-
Citations
12 Claims
-
1. A method comprising:
-
generating a number of Information-Gain (IG)-Trees based on a memory learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document; extracting entity names and relations between entity names based on the IG-Trees; receiving annotated data; parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and extracting the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory machine-readable medium comprising instructions which, when executed by a machine, cause the machine to:
-
generate a number of Information-Gain (IG)-Trees based on a memory-learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document; extract entity names and relations between entity names based on the IG-Trees; receive annotated data; parse, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and extract the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus. - View Dependent Claims (6, 7, 8)
-
-
9. A system having a memory to store instructions, and a processing device to execute the instructions, wherein the instructions cause the processing device to:
-
generate a number of Information-Gain (IG)-Trees based on a memory-learning technique and training sets relating to a document, wherein the training sets are based on global context features, wherein a global context feature offers a broader view of a word or a word sequence with regard to an entirety of the document; extract entity names and relations between entity names based on the IG-Trees; receive annotated data; parse, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and extract the training sets from the parsed annotated data, wherein the training sets are further based on features including one or more of local context features, global context features, surface linguistic features, and deep linguistic features, wherein the global context feature, when included in an IG-Tree, further offers a set of first verbs in a same sentence that appear before or after the word or the word sequence for the entirety of the document or corpus. - View Dependent Claims (10, 11, 12)
-
Specification