SYSTEM AND METHOD FOR STORING AND SEARCHING DATA EXTRACTED FROM TEXT DOCUMENTS
First Claim
1. A computer-implemented method for storing in a computer system, searching and updating data extracted from text documents, the method comprising:
- extracting at least one first information object from a text document;
generating one or more subject-predicate-object triplets for the first information object;
accessing a storage of extracted data that contains a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects extracted from different text documents;
searching the storage of extracted data for a second information object related to the same object in real world as the first information object, wherein two information objects are related when said two information objects have at least the subject parameter in common, and wherein searching includes selecting and searching at least one of three types of identifier tables containing one of a double, a triple and a quad search indices, wherein each search index is based on at least two parameters selected from a subject, a predicate, an object and a document;
when at least one second information object related to the same object in real world as the first information object is found, updating the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the RDF graph and updating at least one of the three types of indexes tables.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are system and method for storing, searching and updating extracted data for natural language processing of text. An example method comprises extracting at least one first information object from a text document; generating one or more subject-predicate-object triplets for the first information object; accessing a storage of extracted data that contains a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects; searching the storage of extracted data for a second information object related to the first information object, wherein searching includes selecting and searching at least one of three types of N-gram identifier tables containing one of a double, a triple and a quad search indices associated with at least two of a subject, a predicate, an object and a document; when at least one second information object related to the first information object is found, wherein two objects are related when said two objects have at least one of a subject, a predicate and an object in common, updating the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the master RDF graph and associating the first and second information objects with each other.
21 Citations
26 Claims
-
1. A computer-implemented method for storing in a computer system, searching and updating data extracted from text documents, the method comprising:
-
extracting at least one first information object from a text document; generating one or more subject-predicate-object triplets for the first information object; accessing a storage of extracted data that contains a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects extracted from different text documents; searching the storage of extracted data for a second information object related to the same object in real world as the first information object, wherein two information objects are related when said two information objects have at least the subject parameter in common, and wherein searching includes selecting and searching at least one of three types of identifier tables containing one of a double, a triple and a quad search indices, wherein each search index is based on at least two parameters selected from a subject, a predicate, an object and a document; when at least one second information object related to the same object in real world as the first information object is found, updating the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the RDF graph and updating at least one of the three types of indexes tables. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for storing, searching and updating extracted data, the system comprising:
-
a storage of extracted data containing a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects; a hardware processor coupled to the storage, the processor being configured to; extract at least one first information object from a text document; generate one or more subject-predicate-object triplets for the first information object; search the storage of extracted data for a second information object related to the same object in real world as the first information object, wherein two information objects are related when said two information objects have at least the subject parameter in common, and wherein searching includes selecting and searching at least one of three types of N-gram identifier tables containing one of a double, a triple and a quad search indices, wherein each search index is based on at least two parameters selected from a subject, a predicate, an object and a document; when at least one second information object related to the same object in real world as the first information object is found, update the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the RDF graph and updating at least one of the three types of indexes tables. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product stored on a non-transitory computer-readable storage medium, the computer program product comprising computer-executable instructions for storing, searching and updating extracted data, comprising instructions for:
-
extracting at least one first information object from a text document; generating one or more subject-predicate-object triplets for the first information object; accessing a storage of extracted data that contains a RDF graph comprising a plurality of subject-predicate-object triplets for a plurality of different information objects; searching the storage of extracted data for a second information object related to the same object in real world as the first information object, wherein two information objects are related when said two information objects have at least the subject parameter in common, and wherein searching includes selecting and searching at least one of three types of identifier tables containing one of a double, a triple and a quad search indices, wherein each search index is based on at least two parameters selected from a subject, a predicate, an object and a document; when at least one second information object related to the same object in real world as the first information object is found, updating the storage of extracted data by adding the at least one subject-predicate-object triplet of the first information object to the RDF graph and updating at least one of the three types of indexes tables. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26)
-
Specification