Method and apparatus for extracting entity names and their relations
First Claim
Patent Images
1. A method comprising:
- receiving annotated data;
parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and
extracting training sets from the parsed annotated data, wherein the training sets are based on a plurality of features, wherein extracting comprises at least one of tagging the annotated data for marking words, and defining and segmenting words based on languages, wherein extracting further comprises extracting entity names and relations between entity names based on the information sets, and wherein extracting further comprises identifying information sets using memory-based Information Gain (IG)-Trees, wherein the IG-Trees are generated based on the plurality of features, wherein the plurality of features comprise one or more of words, phrases, sentences, and objects, and wherein each information set is identified based on a corresponding memory-based IG-Tree including one or more of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree, and a relation IG-Tree.
1 Assignment
0 Petitions
Accused Products
Abstract
According to one embodiment of the invention, a method includes generating a person-name Information Gain (IG)-Tree and a relation IG-Tree from annotated data. The method also includes tagging and partial parsing of an input document. The names of the persons are extracted within the input document using the person-name IG-tree. Additionally, names of organizations are extracted within the input document. The method also includes extracting entity names that are not names of persons and organizations within the input document. Further, the relations between the identified entity names are extracted using the relation-IG-tree.
-
Citations
15 Claims
-
1. A method comprising:
-
receiving annotated data; parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and extracting training sets from the parsed annotated data, wherein the training sets are based on a plurality of features, wherein extracting comprises at least one of tagging the annotated data for marking words, and defining and segmenting words based on languages, wherein extracting further comprises extracting entity names and relations between entity names based on the information sets, and wherein extracting further comprises identifying information sets using memory-based Information Gain (IG)-Trees, wherein the IG-Trees are generated based on the plurality of features, wherein the plurality of features comprise one or more of words, phrases, sentences, and objects, and wherein each information set is identified based on a corresponding memory-based IG-Tree including one or more of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree, and a relation IG-Tree. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system having a storage device to store instructions, and a processing device to execute the instructions, wherein the execution of the instructions cause the processing device to perform one or more operations comprising:
-
receiving annotated data; parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; and extracting training sets from the parsed annotated data, wherein the training sets are based on a plurality of features, wherein extracting comprises at least one of tagging the annotated data for marking words, and defining and segmenting words based on languages, wherein extracting further comprises extracting entity names and relations between entity names based on the information sets, and wherein extracting further comprises identifying information sets using memory-based Information Gain (IG)-Trees, wherein the IG-Trees are generated based on the plurality of features, wherein the plurality of features comprise one or more of words, phrases, sentences, and objects, and wherein each information set is identified based on a corresponding memory-based IG-Tree including one or more of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree, and a relation IG-Tree. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A machine-readable medium having stored thereon instructions which when executed by a processing device, cause the computing device to perform one or more operations comprising:
-
receiving annotated data; parsing, at least partially, the annotated data, wherein parsing includes identifying syntactic structure of sentences within the annotated data; extracting training sets from the parsed annotated data, wherein the training sets are based on a plurality of features, wherein extracting comprises at least one of tagging the annotated data for marking words, and defining and segmenting words based on languages, wherein extracting further comprises extracting entity names and relations between entity names based on the information sets, and wherein extracting further comprises identifying information sets using memory-based Information Gain (IG)-Trees, wherein the IG-Trees are generated based on the plurality of features, wherein the plurality of features comprise one or more of words, phrases, sentences, and objects, and wherein each information set is identified based on a corresponding memory-based IG-Tree including one or more of a person-name IG-Tree, an entity-name IG-Tree, a noun phrase IG-Tree, and a relation IG-Tree. - View Dependent Claims (12, 13, 14, 15)
-
Specification