Methods and systems for extracting information from text
First Claim
Patent Images
1. A computer-implemented method for automatically identifying entity-value pairs from text, the method comprising the following operations performed by at least one processor:
- receiving an electronic text file including a text corpus comprising a plurality of words;
generating, by parsing the text corpus, a corresponding parse tree structure in memory, including nodes of each of the plurality of words having edges based on the parts of speech of the plurality of words;
identifying a plurality of entity-value pairs in the text corpus that correspond to a predetermined entity and a predetermined value related to the predetermined entity by a predetermined attribute, wherein each of the entity-value pairs comprise an entity and a value;
extracting based on the parse tree structure, a plurality of parse tree paths to traverse the tree structure from a node corresponding the entity to a node corresponding to the value of the plurality of entity-value pairs;
generating a data record including an indication of how accurately the extracted plurality of parse tree paths correspond to the predetermined attribute, based on at least one of the plurality of parse tree paths; and
validating an entity-value pair based on the data record.
2 Assignments
0 Petitions
Accused Products
Abstract
Information may be extracted from a text corpus. The text corpus may be parsed into a parse tree structure based on the parts of speech of the words of the text corpus. A path in the parse tree structure may be identified as linking an entity and a value, and the path may be applied to the same or other text corpuses to extract other instances of entity-value pairs. Extracted information, associated paths, or both may be validated in some instances.
48 Citations
35 Claims
-
1. A computer-implemented method for automatically identifying entity-value pairs from text, the method comprising the following operations performed by at least one processor:
-
receiving an electronic text file including a text corpus comprising a plurality of words; generating, by parsing the text corpus, a corresponding parse tree structure in memory, including nodes of each of the plurality of words having edges based on the parts of speech of the plurality of words; identifying a plurality of entity-value pairs in the text corpus that correspond to a predetermined entity and a predetermined value related to the predetermined entity by a predetermined attribute, wherein each of the entity-value pairs comprise an entity and a value; extracting based on the parse tree structure, a plurality of parse tree paths to traverse the tree structure from a node corresponding the entity to a node corresponding to the value of the plurality of entity-value pairs; generating a data record including an indication of how accurately the extracted plurality of parse tree paths correspond to the predetermined attribute, based on at least one of the plurality of parse tree paths; and validating an entity-value pair based on the data record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for automatically identifying entity-value pairs from text, the system comprising:
-
a storage device that stores a set of instructions; and at least one processor that executes the set of instructions to; receive an electronic text file including a text corpus comprising a plurality of words; generate by parsing the text corpus, a corresponding parse tree structure in memory, including nodes of each of the plurality of words having edges based on the parts of speech of the plurality of words; identify a plurality of entity-value pairs in the text corpus that correspond to a predetermined entity and a predetermined value related to the predetermined entity by a predetermined attribute, wherein each of the entity-value pairs comprise an entity and a value; extract, based on the parse tree structure, a plurality of parse tree paths to traverse the tree structure from a node corresponding the entity to a node corresponding to the value of the plurality of entity-value pairs; generate a data record including an indication of how accurately the extracted plurality of parse tree paths correspond to the predetermined attribute, based on at least one of the plurality of parse tree paths; and validating an entity-value pair based on the data record. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-implemented method for automatically identifying entity-value pairs from text, the method comprising the following operations performed by at least one processor:
-
receiving an electronic text file including a text corpus comprising a plurality of words; generating, by parsing the text corpus, a corresponding parse tree structure in memory, including nodes of each of the plurality of words having edges based on the parts of speech of the plurality of words; determining whether the parse tree structure includes a predetermined parse tree path corresponding to an attribute, the predetermined parse tree path traversing the parse tree structure from a node corresponding an entity to a node corresponding to a value of an entity-value pairs; identifying a portion of the text corpus corresponding to the predetermined parse tree path when the parse tree structure includes the predetermined parse tree path; extracting an entity-value pair from the portion of the text corpus, wherein the entity-value pair comprises an entity and a value corresponding to the entity, and wherein the entity-value pair corresponds to an entity-attribute-value relationship; and validating the entity-value pair. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
-
28. A system for automatically identifying entity-value pairs from text, the system comprising:
-
a storage device that stores a set of instructions; and at least one processor that executes the set of instructions to; receive an electronic text file including a text corpus comprising a plurality of words; generating, with a processor, by parsing the text corpus, a corresponding parse tree structure in memory, including nodes of each of the plurality of words having edges based on the parts of speech of the plurality of words; determine whether the parse tree structure includes a predetermined parse tree path corresponding to an attribute, the predetermined parse tree path traversing the parse tree structure from a node corresponding an entity to a node corresponding to a value of an entity-value pairs; identify a portion of the text corpus corresponding to the predetermined parse tree path when the parse tree structure includes the predetermined parse tree path; extract an entity-value pair from the portion of the text corpus, wherein the entity-value pair comprises an entity and a value corresponding to the entity, and wherein the entity-value pair corresponds to an entity-attribute-value relationship; and validating the entity-value pair. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35)
-
Specification