System and Method for Mining Large, Diverse, Distributed, and Heterogeneous Datasets
First Claim
1. A method for directed mining of a heterogeneous dataset with a computer comprising the steps of:
- populating a rule base with known rules, wherein each rule has a context and a situation;
populating a case base with known cases, wherein each case has a context and a situation, and wherein the case base is partitioned from the rule base;
ascribing a natural language semantics to predicates of the known cases and rules;
randomly transforming the known rules and the known cases to form new rules by extracting a maximum number of common predicates;
segmenting the rules and the cases on the basis of shared predicates without making distinction between context and situation predicates;
abducing new knowledge from the dataset by fuzzily matching the context of a new rule to a situation the new rule does not cover; and
issuing a query to a user to supply missing predicates of the fuzzy match.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for directed mining of a heterogeneous dataset with a computer comprising: populating a rule base with known rules, wherein each rule has a context and a situation; populating a case base with known cases, wherein each case has a context and a situation, and wherein the case base is partitioned from the rule base; ascribing a natural language semantics to predicates of the known cases and rules; randomly transforming the known rules and the known cases to form new rules by extracting a maximum number of common predicates; segmenting the rules and the cases on the basis of shared predicates without making distinction between context and situation predicates; abducing new knowledge from the dataset by fuzzily matching the context of a new rule to a situation the new rule does not cover; and issuing a query to a user to supply missing predicates of the fuzzy match.
-
Citations
18 Claims
-
1. A method for directed mining of a heterogeneous dataset with a computer comprising the steps of:
-
populating a rule base with known rules, wherein each rule has a context and a situation; populating a case base with known cases, wherein each case has a context and a situation, and wherein the case base is partitioned from the rule base; ascribing a natural language semantics to predicates of the known cases and rules; randomly transforming the known rules and the known cases to form new rules by extracting a maximum number of common predicates; segmenting the rules and the cases on the basis of shared predicates without making distinction between context and situation predicates; abducing new knowledge from the dataset by fuzzily matching the context of a new rule to a situation the new rule does not cover; and issuing a query to a user to supply missing predicates of the fuzzy match. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for directed mining of a heterogeneous dataset comprising the steps of:
-
dividing the dataset into domain-specific segments, each segment stored on a separate processor, wherein each segment comprises case productions and rule productions that are partitioned into rule and case bases, and wherein each production comprises a context predicated by a situation and vice versa; creating a shared symbolic predicate interpretation lookup table in each segment that includes primitive symbols and their sequences, context, and situation interpretations; searching a segment by selecting a given production and transforming it by replacing its context with predicate equivalents and then searching the group of segments for a situation covered by the transformed context; defining random and symmetric quantums for each processor, wherein the random and symmetric quantums represent the exclusive time spent in performing random and symmetric searches respectively in a most-recently-successful rule discovery, per processor; initializing both quantums in each processor to the same value so as to preserve fairness and prevent thrashing; terminating search prior to quantum expiration if no situation is found to be applicable; alternating search in each processor between symmetric and random search proportionately favoring the type of search having the shorter quantum, wherein ties are broken at uniform chance; updating the quantums if the transformed context covers, or fuzzily matches upon interrupt, a given situation in one of the processors and if a likelihood of the combination of the transformed context and the given situation is within a possibility squelch, wherein the possibility of the combination of the transformed context and the given situation is the product of the possibility of each transform in the combination; adding the transformed context and the covered situation as a new rule to the logical head of the rule base of those segment(s) having maximal cohesion subject to relation; taking the cases and/or rules having the highest one-step possibilities, and issuing questions/queries to a user as to the status of their unmatched situational predicates if upon timer/quantum interrupt, a complete covering of a situation in a segment is not found; expunging all cases and rules that are found to be in error; expunging the least-recently-used (LRU) cases and rules in a segment to free storage space as necessary; maintaining in each segment a local stacking mechanism for excluding cycles in transformation by checking for duplicate states whenever a transformed context and transform is to be stacked; checking the final results of locally acyclic and successful transformations against the contents of every segment to insure that it is unknown; moving to the tail of its containing segment any transform, which gave rise to a duplicate state; terminating the stacking mechanism upon interrupt, or failure to find a randomization within cumulative likelihood within the possibility squelch; and providing the acyclic contexts, transformations, and transformed contexts, on the segment stacks, as sequential metaphorical explanations. - View Dependent Claims (15, 16, 17, 18)
-
Specification