System and Method for Mining Large, Diverse, Distributed, and Heterogeneous Datasets

US 20150178636A1
Filed: 06/26/2014
Published: 06/25/2015
Est. Priority Date: 04/06/2010
Status: Active Grant

First Claim

Patent Images

1. A method for directed mining of a heterogeneous dataset with a computer comprising the steps of:

populating a rule base with known rules, wherein each rule has a context and a situation;

populating a case base with known cases, wherein each case has a context and a situation, and wherein the case base is partitioned from the rule base;

ascribing a natural language semantics to predicates of the known cases and rules;

randomly transforming the known rules and the known cases to form new rules by extracting a maximum number of common predicates;

segmenting the rules and the cases on the basis of shared predicates without making distinction between context and situation predicates;

abducing new knowledge from the dataset by fuzzily matching the context of a new rule to a situation the new rule does not cover; and

issuing a query to a user to supply missing predicates of the fuzzy match.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for directed mining of a heterogeneous dataset with a computer comprising: populating a rule base with known rules, wherein each rule has a context and a situation; populating a case base with known cases, wherein each case has a context and a situation, and wherein the case base is partitioned from the rule base; ascribing a natural language semantics to predicates of the known cases and rules; randomly transforming the known rules and the known cases to form new rules by extracting a maximum number of common predicates; segmenting the rules and the cases on the basis of shared predicates without making distinction between context and situation predicates; abducing new knowledge from the dataset by fuzzily matching the context of a new rule to a situation the new rule does not cover; and issuing a query to a user to supply missing predicates of the fuzzy match.

Citations

18 Claims

1. A method for directed mining of a heterogeneous dataset with a computer comprising the steps of:
- populating a rule base with known rules, wherein each rule has a context and a situation;
  
  populating a case base with known cases, wherein each case has a context and a situation, and wherein the case base is partitioned from the rule base;
  
  ascribing a natural language semantics to predicates of the known cases and rules;
  
  randomly transforming the known rules and the known cases to form new rules by extracting a maximum number of common predicates;
  
  segmenting the rules and the cases on the basis of shared predicates without making distinction between context and situation predicates;
  
  abducing new knowledge from the dataset by fuzzily matching the context of a new rule to a situation the new rule does not cover; and
  
  issuing a query to a user to supply missing predicates of the fuzzy match.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein cases and rules are stored in segments so as to increase their domain-specificity across parallel processors.
  - 3. The method of claim 2, further comprising the step of calculating a relative possibility (i) for a given situation and a given context such that the
  - 4. The method of claim 3, further comprising the step of discarding new rules that have a possibility below a possibility squelch threshold.
  - 5. The method of claim 1, further comprising iteratively mapping the dataset with a learning, natural-language-processing system to yield generalized qualitative descriptors to replace natural language contexts and situations.
  - 6. The method of claim 1, further comprising the step of providing a metaphorical explanation for each new rule by providing a description of the sequence of transformations that led to the new rule.
  - 7. The method of claim 4, wherein for a given processor random and symmetric quantum values are dynamically defined, wherein the random quantum defines the amount of time that the given processor may spend in the performance of random search for new knowledge and the symmetric quantum defines the amount of time that the given processor may spend in the performance of symmetric search for new knowledge.
  - 8. The method of claim 7, further comprising the steps of:
    - initializing search using the random quantum;
      
      terminating the search prior to quantum expiration, if no production is found to be applicable;
      
      searching in each processor for alternates—
      
      proportionately favoring either random or symmetric search depending on which has the shorter quantum; and
      
      updating the quantum values if a new transformed context covers, or fuzzily matches upon interrupt, any given situation in any given processor, where the possibility of the combination of the new transformed context and the given situation is within the possibility squelch.
  - 9. The method of claim 8, further comprising the step of querying for the status of unmatched context predicates upon a quantum interrupt.
  - 10. The method of claim 1, further comprising the step of moving successfully fired cases and rules to a logical head of their respective bases.
  - 11. The method of claim 5, further comprising the step of transforming a user-supplied context into a transformed context by randomizing the qualitative descriptors in the user-supplied context based on common predicates in the case and rule bases.
  - 12. The method of claim 1, wherein cases and rules are ranked in order of most to least-recently-used (LRU), and wherein the LRU cases and rules are expunged as part of a policy to maximize coherency of each segment.
  - 13. The method of claim 2, wherein segments and processors are sub-divided into logical groups that are based on physical locality and migrated using relaxation techniques.

14. A method for directed mining of a heterogeneous dataset comprising the steps of:
- dividing the dataset into domain-specific segments, each segment stored on a separate processor, wherein each segment comprises case productions and rule productions that are partitioned into rule and case bases, and wherein each production comprises a context predicated by a situation and vice versa;
  
  creating a shared symbolic predicate interpretation lookup table in each segment that includes primitive symbols and their sequences, context, and situation interpretations;
  
  searching a segment by selecting a given production and transforming it by replacing its context with predicate equivalents and then searching the group of segments for a situation covered by the transformed context;
  
  defining random and symmetric quantums for each processor, wherein the random and symmetric quantums represent the exclusive time spent in performing random and symmetric searches respectively in a most-recently-successful rule discovery, per processor;
  
  initializing both quantums in each processor to the same value so as to preserve fairness and prevent thrashing;
  
  terminating search prior to quantum expiration if no situation is found to be applicable;
  
  alternating search in each processor between symmetric and random search proportionately favoring the type of search having the shorter quantum, wherein ties are broken at uniform chance;
  
  updating the quantums if the transformed context covers, or fuzzily matches upon interrupt, a given situation in one of the processors and if a likelihood of the combination of the transformed context and the given situation is within a possibility squelch, wherein the possibility of the combination of the transformed context and the given situation is the product of the possibility of each transform in the combination;
  
  adding the transformed context and the covered situation as a new rule to the logical head of the rule base of those segment(s) having maximal cohesion subject to relation;
  
  taking the cases and/or rules having the highest one-step possibilities, and issuing questions/queries to a user as to the status of their unmatched situational predicates if upon timer/quantum interrupt, a complete covering of a situation in a segment is not found;
  
  expunging all cases and rules that are found to be in error;
  
  expunging the least-recently-used (LRU) cases and rules in a segment to free storage space as necessary;
  
  maintaining in each segment a local stacking mechanism for excluding cycles in transformation by checking for duplicate states whenever a transformed context and transform is to be stacked;
  
  checking the final results of locally acyclic and successful transformations against the contents of every segment to insure that it is unknown;
  
  moving to the tail of its containing segment any transform, which gave rise to a duplicate state;
  
  terminating the stacking mechanism upon interrupt, or failure to find a randomization within cumulative likelihood within the possibility squelch; and
  
  providing the acyclic contexts, transformations, and transformed contexts, on the segment stacks, as sequential metaphorical explanations.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The method of claim 14, wherein the cases and rules are separately linked in each segment though they share a common LRU link to free space.
  - 16. The method of claim 15, wherein groups of segments are determined by collecting maximally similar segment, dynamically determined on the heuristic basis of physical locality.
  - 17. The method of claim 16, wherein the number of segments in a group is determined by the number of processors and the capability for concurrent search among them.
  - 18. The method of claim 17, wherein a transformed context and covered situation need not only be applicable, but must reduce the distance between the context and at least one situation in the case or rule base in symmetric transformation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
the united states of america as represented by the secretary of the navy
Original Assignee
the united states of america as represented by the secretary of the navy
Inventors
Rubin, Stuart Harvey

Granted Patent

US 9,449,280 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 5/025   Extracting rules from data

G06N 5/048   Fuzzy inferencing

G06N 7/02   using fuzzy logic computing...

System and Method for Mining Large, Diverse, Distributed, and Heterogeneous Datasets

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

System and Method for Mining Large, Diverse, Distributed, and Heterogeneous Datasets

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links