Automatic rule coaching

US 9,754,208 B2
Filed: 09/02/2014
Issued: 09/05/2017
Est. Priority Date: 09/02/2014
Status: Active Grant

First Claim

Patent Images

1. A method of validating rules configured to be utilized in an information extraction application, the rules being stored in a rules database, the method being implemented via execution of computer instructions configured to run at one or more processing modules and configured to be stored at one or more non-transitory memory storage modules, the method comprising:

receiving a plurality of labeled samples in a training database, each of the plurality of labeled samples comprising a different data point and an assured output, the assured output corresponding to the different data point for the information extraction application;

for each of the rules in the rules database;

determining, for each data point of the different data points of the plurality of labeled samples in the training database to which the rule applies, whether applying the rule to the data point has a positive impact on matching an output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a positive voter, or whether applying the rule to the data point has a negative impact on matching the output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a negative voter;

generating positive impact information for the rule based on the positive voters, wherein the positive impact information comprises a quantity of the positive voters;

generating negative impact information for the rule based on the negative voters, wherein the negative impact information comprises a quantity of the negative voters; and

determining a metric for the rule based on the quantity of the negative voters and the quantity of the positive voters;

ranking the rules based on the metrics corresponding to the rules;

sending to a user for refinement one or more flagged rules of the rules that have a lowest ranking of the metric;

receiving from the user one or more refined rules;

generating a first output for a first data point in an information database based on the rules in the rules database, the rules in the rules database comprising the one or more refined rules, the plurality of labeled samples in the training database being devoid of the first data point;

receiving a request for information from a second user; and

presenting the first output to the second user in response to the request.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of validating rules configured to be utilized in an information extraction application, including: receiving a plurality of labeled samples in a training database; for each of the rules in the rule database: (a) determining, for each of the data points of the plurality of labeled samples in the training database to which the rule applies, whether applying the rule to the data point has a positive or negative impact on matching an output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point; (b) generating positive impact information for the rule based on the positive voters; (c) generating negative impact information for the rule based on the negative voters; and (d) determining a metric for the rule based on the quantity of the negative voters and the quantity of the positive voters; ranking the rules based on the metrics corresponding to the rules; and sending to a user for refinement one or more flagged rules of the rules that have a lowest ranking of the metric. Other embodiments are provided.

11 Citations

View as Search Results

20 Claims

1. A method of validating rules configured to be utilized in an information extraction application, the rules being stored in a rules database, the method being implemented via execution of computer instructions configured to run at one or more processing modules and configured to be stored at one or more non-transitory memory storage modules, the method comprising:
- receiving a plurality of labeled samples in a training database, each of the plurality of labeled samples comprising a different data point and an assured output, the assured output corresponding to the different data point for the information extraction application;
  
  for each of the rules in the rules database;
  
  determining, for each data point of the different data points of the plurality of labeled samples in the training database to which the rule applies, whether applying the rule to the data point has a positive impact on matching an output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a positive voter, or whether applying the rule to the data point has a negative impact on matching the output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a negative voter;
  
  generating positive impact information for the rule based on the positive voters, wherein the positive impact information comprises a quantity of the positive voters;
  
  generating negative impact information for the rule based on the negative voters, wherein the negative impact information comprises a quantity of the negative voters; and
  
  determining a metric for the rule based on the quantity of the negative voters and the quantity of the positive voters;
  
  ranking the rules based on the metrics corresponding to the rules;
  
  sending to a user for refinement one or more flagged rules of the rules that have a lowest ranking of the metric;
  
  receiving from the user one or more refined rules;
  
  generating a first output for a first data point in an information database based on the rules in the rules database, the rules in the rules database comprising the one or more refined rules, the plurality of labeled samples in the training database being devoid of the first data point;
  
  receiving a request for information from a second user; and
  
  presenting the first output to the second user in response to the request.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 further comprising:
    - receiving from the user one of;
      
      (1) one or more updated rules that are a modification of the one or more flagged rules, or (2) a deletion of the one or more flagged rules.
  - 3. The method of claim 1 further comprising:
    - sending to the user one or more candidate outputs for each of the one or more flagged rules, wherein, for each of the one or more flagged rules, the one or more candidate outputs comprise one or more of the assured outputs of the plurality of labeled samples that most frequently correspond to the different data points of the plurality of labeled samples to which the flagged rule applies.
  - 4. The method of claim 1 further comprising:
    - iteratively sending to the user for refinement the one or more flagged rules of the rules that have the lowest ranking of the metric until the metric of a next lowest rule is within a predetermined threshold.
  - 5. The method of claim 1, wherein:
    - the metric for each of the rules is based on a ratio of the quantity of the negative voters to the quantity of the positive voters.
  - 6. The method of claim 1, wherein:
    - the rules in the rules database comprise whitelist rules and blacklist rules.
  - 7. The method of claim 1, wherein:
    - the information extraction application comprises product type classification.
  - 8. The method of claim 1, wherein:
    - the information extraction application comprises data normalization.

9. A method of validating rules configured to be utilized in an information extraction application, the method being implemented via execution of computer instructions configured to run at one or more processing modules and configured to be stored at one or more non-transitory memory storage modules, the method comprising:
- sending to a user a first data point for the information extraction application;
  
  receiving from the user a first assured output corresponding to the first data point for the information extraction application based on human knowledge of the user;
  
  storing the first data point and the first assured output as a first labeled sample in a training database, the training database comprising a plurality of labeled samples each for a different data point and an assured output, the assured output corresponding to the different data point for the information extraction application;
  
  generating a first output for the first data point based on a first set of rules in a rules database comprising the rules configured to be utilized in the information extraction application;
  
  sending to the user the first output for the first data point;
  
  receiving from the user one of;
  
  (1) a first new rule for the information extraction application of the first data point based on the human knowledge of the user, or (2) a first updated existing rule that is a modification of one of the first set of rules in the rules database, one of the first new rule or the first updated existing rule being a user-inputted rule;
  
  storing the user-inputted rule in the rules database;
  
  sending to the user an updated output for the first data point based on the rules in the rules database;
  
  the rules in the rules database comprising the user-inputted rule and the first set of rules;
  
  determining, for each data point of the different data points of the plurality of labeled samples in the training database to which the user-inputted rule applies, whether applying the user-inputted rule to the data point has a positive impact on matching an output for the data point based on the user-inputted rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a positive voter, or whether applying the user-inputted rule to the data point has a negative impact on matching the output for the data point based on the user-inputted rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a negative voter;
  
  generating positive impact information for the user-inputted rule based on the positive voters;
  
  generating negative impact information for the user-inputted rule based on the negative voters; and
  
  sending to the user the positive and negative impact information for the user-inputted rule.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9 further comprising:
    - receiving from the user one of;
      
      (1) a second new rule for the information extraction application of the first data point based on the human knowledge of the user, or (2) a second updated existing rule that is a modification of one of the rules in the rules database.
  - 11. The method of claim 10, wherein:
    - the second updated existing rule is a modification of the user-inputted rule.
  - 12. The method of claim 9, wherein:
    - sending to the user the first output for the first data point further comprises;
      
      sending to the user a listing of the rules that apply to the first data point; and
      
      sending to the user the updated output for the first data point based on the rules in the rules database comprises;
      
      sending to the user an updated listing of the rules that apply to the first data point.
  - 13. The method of claim 9, wherein:
    - the positive impact information comprises a quantity of the positive voters; and
      
      the negative impact information comprises a quantity of the negative voters.
  - 14. The method of claim 9, wherein:
    - the positive impact information comprises a listing of the positive voters; and
      
      the negative impact information comprises a listing of the negative voters.
  - 15. The method of claim 9, wherein:
    - the rules in the rules database comprise whitelist rules and blacklist rules.
  - 16. The method of claim 9, wherein:
    - the information extraction application comprises one of product type classification or data normalization.

17. A system for validating rules configured to be utilized in an information extraction application, the rules being stored in a rules database, the system comprising:
- one or more processing modules; and
  
  one or more non-transitory memory storage modules storing computing instructions configured to run on the one or more processing modules and perform;
  
  receiving a plurality of labeled samples in a training database, each of the plurality of labeled samples comprising a different data point and an assured output, the assured output corresponding to the different data point for the information extraction application;
  
  for each of the rules in the rules database;
  
  determining, for each data point of the different data points of the plurality of labeled samples in the training database to which the rule applies, whether applying the rule to the data point has a positive impact on matching an output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a positive voter, or whether applying the rule to the data point has a negative impact on matching the output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a negative voter;
  
  generating positive impact information for the rule based on the positive voters, wherein the positive impact information comprises a quantity of the positive voters;
  
  generating negative impact information for the rule based on the negative voters, wherein the negative impact information comprises a quantity of the negative voters; and
  
  determining a metric for the rule based on the quantity of the negative voters and the quantity of the positive voters;
  
  ranking the rules based on the metrics corresponding to the rules;
  
  sending to a user for refinement one or more flagged rules of the rules that have a lowest ranking of the metric;
  
  receiving from the user one or more refined rules;
  
  generating a first output for a first data point in an information database based on the rules in the rules database, the rules in the rules database comprising the one or more refined rules, the plurality of labeled samples in the training database being devoid of the first data point;
  
  receiving a request for information from a second user; and
  
  presenting the first output to the second user in response to the request.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the computing instructions are further configured to perform:
    - receiving from the user one of;
      
      (1) one or more updated rules that are a modification of the one or more flagged rules, or (2) a deletion of the one or more flagged rules.
  - 19. The system of claim 17, wherein the computing instructions are further configured to perform:
    - sending to the user one or more candidate outputs for each of the one or more flagged rules, wherein, for each of the one or more flagged rules, the one or more candidate outputs comprise one or more of the assured outputs of the plurality of labeled samples that most frequently correspond to the different data points of the plurality of labeled samples to which the flagged rule applies.
  - 20. The system of claim 17, wherein the computing instructions are further configured to perform:
    - iteratively sending to the user for refinement the one or more flagged rules of the rules that have the lowest ranking of the metric until the metric of a next lowest rule is within a predetermined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Walmart Apollo, LLC (WalMart Inc.)
Original Assignee
Wal-Mart Stores Texas LLC (WalMart Inc.)
Inventors
Xie, Jun, Sun, Chong, Yang, Fan, Rampalli, Narasimhan
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Seck, Ababacar

Application Number

US14/475,470
Publication Number

US 20160063386A1
Time in Patent Office

1,099 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/2411   based on the proximity to a...

G06N 20/00   Machine learning

G06N 5/02   Knowledge representation; S...

G06N 5/025   Extracting rules from data

G06N 7/01   Probabilistic graphical mod...

Automatic rule coaching

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

11 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic rule coaching

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

11 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links