Automatic definition of entity collections
First Claim
1. A computer system comprising:
- at least one processor; and
one or more memories storing;
a knowledge base including entities connected by edges, wherein the edges represent at least thousands of factual relationships that may link two of the entities, andinstructions that, when executed by the at least one processor, cause the computer system to;
determine a first set of entities from the knowledge base,determine a second set of constraints, the second set including a quantity of constraints, wherein a constraint in the second set identifies a constraint type and identifies a path of at least one edge in the knowledge base that is shared by at least two of the entities in the first set,generate candidate collection definitions from combinations of the constraints in the second set, where each candidate collection definition identifies one or more constraints from the second set in conjunctive normal form,prune the candidate collection definitions by discarding candidate collection definitions having an information gain that fails to meet a threshold, andstore at least one of the candidate collection definitions as a candidate collection in the one or more memories, the candidate collection having an information gain that meets the threshold, wherein the candidate collection definition is used to determine entities in the knowledge base belonging to the candidate collection.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for automatically generating entity collections comprises a data graph including entities connected by edges and instructions that cause the computer system to determine a set of entities from the data graph and to determine a set of constraints that has a quantity of constraints. A constraint in the set represents a path in the data graph shared by at least two of the entities in the set of entities. The instructions also cause the computer system to generate candidate collection definitions from combinations of the constraints, where each candidate collection definition identifies at least one constraint and no more than the quantity of constraints. The instructions also cause the computer system to determine an information gain for at least some of the candidate collection definitions, and store at least one candidate collection definition that has an information gain that meets a threshold as a candidate collection.
-
Citations
26 Claims
-
1. A computer system comprising:
-
at least one processor; and one or more memories storing; a knowledge base including entities connected by edges, wherein the edges represent at least thousands of factual relationships that may link two of the entities, and instructions that, when executed by the at least one processor, cause the computer system to; determine a first set of entities from the knowledge base, determine a second set of constraints, the second set including a quantity of constraints, wherein a constraint in the second set identifies a constraint type and identifies a path of at least one edge in the knowledge base that is shared by at least two of the entities in the first set, generate candidate collection definitions from combinations of the constraints in the second set, where each candidate collection definition identifies one or more constraints from the second set in conjunctive normal form, prune the candidate collection definitions by discarding candidate collection definitions having an information gain that fails to meet a threshold, and store at least one of the candidate collection definitions as a candidate collection in the one or more memories, the candidate collection having an information gain that meets the threshold, wherein the candidate collection definition is used to determine entities in the knowledge base belonging to the candidate collection. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
determining, using at least one processor, a first set of entities from a knowledge base of entities connected by edges, wherein the edges represent at least thousands of factual relationships that may link two of the entities; determining a plurality of constraints, each constraint identifying a constraint type and identifying a target node and a path of at least one edge leading to the target node from at least two of the entities in the first set; generating, using the at least one processor, a correlation score for each of the plurality of constraints; using the correlation scores to select a quantity of constraints for a set of constraints; generating, using the at least one processor, candidate collection definitions from combinations of the set of constraints, where each candidate collection definition identifies one or more constraints from the set of constraints in conjunctive normal; pruning the candidate collection definitions by discarding candidate collection definitions having an information gain that fails to meet a threshold; and storing at least one of the candidate collection definitions as a candidate collection in a memory, the candidate collection having an information gain that meets the threshold. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A computer system comprising:
-
at least one processor; and one or more memories storing; a knowledge base including entities connected by edges, wherein the edges represent at least thousands of factual relationships that may link two of the entities, candidate collection definitions, each collection definition including one or more constraints in conjunctive normal form, a constraint identifying a constraint type and identifying representing a path in the knowledge base, and instructions that, when executed by the at least one processor, cause the computer system to; generate a name for a first candidate collection definition of the candidate collection definitions based on properties of the paths identified by the constraints of the candidate collection definition, and provide the name as a suggestion to a curator of the candidate collection definitions. - View Dependent Claims (23, 24, 25, 26)
-
Specification