COLLECTING, ORGANIZING, AND SEARCHING KNOWLEDGE ABOUT A DATASET
First Claim
1. A computer-implemented method for creating a dataset to facilitate natural language searching of data from a plurality of different source files, the computer-implemented method comprising:
- identifying, by a processor, different triple extraction techniques corresponding to source files of different types;
extracting, by the processor, triples from each of the source files using a triple extraction technique corresponding to a type of the respective source file, each of the triples extracted by identifying, from the respective source file, a first natural language phrase as a subject, a second natural language phrase as an object, and an association between the first natural language phrase and the second natural language phrase as a predicate; and
storing, by the processor, the triples in the dataset.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for organizing knowledge about a dataset storing data from or about multiple sources may be provided. For example, the data can be accessed from the multiple sources and categorized based on the data type. For each data type, a triple extraction technique specific to that data type may be invoked. One set of techniques can allow the extraction of triples from the data based on natural language-based rules. Another set of techniques can allow a similar extraction based on logical or structural-based rules. A triple may store a relationship between elements of the data. The extracted triples can be stored with corresponding identifiers in a list. Further, dictionaries storing associations between elements of the data and the triples can be updated. The list and the dictionaries can be used to return triples in response to a query that specifies one or more elements.
-
Citations
20 Claims
-
1. A computer-implemented method for creating a dataset to facilitate natural language searching of data from a plurality of different source files, the computer-implemented method comprising:
-
identifying, by a processor, different triple extraction techniques corresponding to source files of different types; extracting, by the processor, triples from each of the source files using a triple extraction technique corresponding to a type of the respective source file, each of the triples extracted by identifying, from the respective source file, a first natural language phrase as a subject, a second natural language phrase as an object, and an association between the first natural language phrase and the second natural language phrase as a predicate; and storing, by the processor, the triples in the dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a processor; a memory communicatively coupled to the processor and bearing instructions that, upon execution by the processor, cause the system to at least perform operations comprising; accessing words from a source file, the words to be organized based on associations between the words; extracting a triple from a subset of the words based on a set of rules, the triple comprising a subject, a predicate, and an object corresponding to the subset of the words, the set of rules applicable to the subset of the words based on a pattern of word types; generating a structure configured to store an association between an entity and a phrase associated with the subject or the object, the entity comprising one or more words from the subject or the object, the phrase comprising the subject or the object; and generating another structure configured to store another association between the phrase and the triple. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer-readable storage medium storing instructions that, when executed on a computing device, configure the computing device to perform operations comprising:
-
detecting words and corresponding word types from a source; generating a triple based on applying a set of rules to the words, the set of rules applied based on a pattern of the word types and providing an association between at least two words, the triple comprising the at least two words; generating a first structure configured to store a first association between a word from the triple and a phrase, the phrase formed based on a proximity of the word with another word; and generating a second structure configured to store a second association between the phrase and the triple. - View Dependent Claims (18, 19, 20)
-
Specification