PROVIDING TRAINING INFORMATION FOR TRAINING A CATEGORIZER
First Claim
Patent Images
1. A system, comprising:
- a data set comprising a plurality of cases;
a search engine to receive a query relating to at least one category and to identify at least one case within the data set that matches the query;
a confirmation module to receive one of a first indication that the identified at least one case belongs to the category, and a second indication that the identified at least one case does not belong to the category; and
a storage to store training information for training a categorizer, the training information modified in response to the confirmation module receiving one of the first indication and second indication.
9 Assignments
0 Petitions
Accused Products
Abstract
Abstract of the Disclosure
A method and system of providing training information for training a categorizer includes receiving a query relating to at least one category and identifying at least one case within a data set that matches the query. The method and system receives one of a first indication that the identified at least one case belongs to the category, and a second indication that the identified at least one case does not belong to the category. Training information is modified based on receiving one of the first indication and second indication.
-
Citations
51 Claims
-
1. A system, comprising:
-
a data set comprising a plurality of cases; a search engine to receive a query relating to at least one category and to identify at least one case within the data set that matches the query; a confirmation module to receive one of a first indication that the identified at least one case belongs to the category, and a second indication that the identified at least one case does not belong to the category; and a storage to store training information for training a categorizer, the training information modified in response to the confirmation module receiving one of the first indication and second indication. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
wherein the confirmation module is adapted to receive one of: - first indications that respective identified cases belong to a category, and second indications that respective identified cases do not belong to the category.
-
3. The system of claim 2, further comprising a display monitor to display information associated with the identified cases, wherein the display monitor is adapted to display selection boxes for respective identified cases, and wherein the confirmation module is adapted to receive the first indications for respective identified cases in response to user selection of respective selection boxes, and the confirmation module is adapted to receive the second indications for respective identified cases in response to user deselection of respective selection boxes.
-
4. The system of claim 1, further comprising a training module to train the categorizer based on the training information, wherein the training module is adapted to train the categorizer concurrently with execution of the search engine and confirmation module.
-
5. The system of claim 4, wherein the training module is adapted to train the categorizer in a background during execution of the search engine and confirmation module.
-
6. The system of claim 1, the search engine to further determine that the categorizer has not labeled the case with any of a set of categories.
-
7. The system of claim 1, wherein the training information further comprises a negative training set of cases for the category, wherein the confirmation module is adapted to add the identified at least one case to the negative training set of cases in response to receiving the second indication, the system further comprising:
a training module to train the categorizer based on addition of the identified at least one case to the negative training set.
-
8. The system of claim 1, wherein the training information comprises a positive training set of cases for the category, wherein the confirmation module is adapted to add the identified at least one case to the positive training set of cases in response to receiving the first indication, the system further comprising:
a training module to train the categorizer based on addition of the identified at least one case to the positive training set.
-
9. The system of claim 1, wherein the query specifies a term and the search engine is adapted to match the term with information associated with the identified at least one case.
-
10. The system of claim 9, wherein the term comprises one of a string expression, a regular expression, a glob expression, a substring expression, and an expression containing non-text data.
-
11. The system of claim 9, wherein the term is associated with a collection of terms, and wherein the search engine is adapted to further match the collection of terms with information associated with the identified at least one case.
-
12. The system of claim 11, wherein the term specified by the query comprises a word, and wherein the collection of terms comprises synonyms of the word.
-
13. The system of claim 1, wherein the search engine identifies the at least one case by performing at least one of the following:
- determining a membership status of the at least one case with respect to a set of cases associated with a second category; and
determining whether or not the categorizer has labeled the at least one case with the second category; and
determining a value provided by the categorizer with respect to the at least one case and a second category.
- determining a membership status of the at least one case with respect to a set of cases associated with a second category; and
-
14. The system of claim 1, further comprising a training module, wherein the training module is adapted to modify the categorizer based on the training information developed in response to the confirmation module,
wherein the categorizer is adapted to compute a confidence level with respect to a decision whether to label the identified at least one case as belonging to the category, and to indicate the identified at least one case as belonging to the category based on comparing the confidence level with a predefined threshold. -
15. The system of claim 1, wherein the category is part of a hierarchy of categories, the system further comprising a category editor to update the hierarchy of categories based on user inputs by adding at least one more category to the hierarchy.
-
16. The system of claim 15, wherein the search engine and confirmation module are adapted to enable identification of cases in the data set with respective categories.
-
17. The system of claim 15, wherein the category editor is adapted to enable creation of subcategories of at least one of the categories.
-
18. The system of claim 1, wherein the category is part of an initial hierarchy of categories, and wherein the initial hierarchy of categories provides a starting point, the system further comprising a category editor to update the initial hierarchy of categories based on user inputs.
-
19. The system of claim 1, further comprising a hierarchy inference module to examine the cases in the data set and to construct a hierarchy of categories in response to examining the cases.
-
20. The system of claim 1, wherein the search engine is adapted to receive another query not relating to the category and to identify at least one case within the data set that matches the other query, the system further comprising a display monitor to display information associated with the identified cases.
-
-
21. A method, comprising:
-
receiving a query relating to at least a first category to search cases stored in a data set; identifying a first group of cases in the data set matching the query; receiving indications of which cases in the first group belong to the first category; and modifying training information for training a categorizer in response to receiving the indications. - View Dependent Claims (22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 35, 37, 38, 39, 40)
receiving a second query to search the cases stored in the data set; identifying a second group of cases in the data set matching the second query; receiving indications of which cases in the second group belong to a second category, the first and second categories being part of a hierarchy of categories; and training the categorizer in response to receiving the indications of which cases in the second group belong to the second category.
-
-
25. The method of claim 24, further comprising:
-
adding cases in the first group indicated as belonging to the first category to a first positive training set of cases; and adding cases in the second group indicated as belonging to the second category to a second positive training set of cases, wherein training the categorizer is based on the first and second positive training sets.
-
-
27. The method of claim 21, further comprising:
-
adding cases in the first group indicated as not belonging to the first category to a negative training set of training cases, wherein training the categorizer is further based on the negative training set.
-
-
28. The method of claim 21, further comprising displaying indicators of which cases in the first group belong or do not belong to the first category.
-
29. The method of claim 28, further comprising wherein displaying the indicators is based on at least one of (1) whether a user has labeled cases as belonging to the first category, (2) whether a user has labeled cases as not belonging to the first category, (3) whether a categorizer has indicated cases as belonging to the first category, (4) whether a categorizer has indicated cases as not belonging to the category, and (5) a score provided by the categorizer with respect to cases and the first category.
-
30. The method of claim 21, further comprising:
receiving an indication that a second category is to be added to a hierarchy of categories, the hierarchy further including the first category.
-
31. The method of claim 30, wherein receiving the indication that the second category is to be added comprises receiving an indication that the second category is to be added as a child of the first category.
-
32. The method of claim 21, further comprising:
receiving an indication that a second category is to be added to an initial hierarchy of categories, the initial hierarchy further including the first category, wherein the initial hierarchy of categories provides a starting point.
-
33. The method of claim 21, wherein the first category is part of a hierarchy of categories, the method further comprising:
-
receiving an indication to delete a second one of the categories from the hierarchy; and deleting the second category from the hierarchy in response to receiving the indication to delete the second category.
-
-
35. The method of claim 21, wherein the first category is part of a hierarchy of categories that further includes a second category and a third category, wherein the second category is a child of the first category, the method further comprising:
receiving an indication that the second category should be a child of the third category instead of the first category.
-
37. The method of claim 21, wherein the first hierarchy is part of a set of hierarchies, the method further comprising:
performing at least one of a plurality of tasks, the plurality of tasks including;
identifying training cases for the first category; and
identifying subcategories for the first category.
-
38. The method of claim 37, further comprising indicating a desirability of switching from one of the plurality of tasks to another of the plurality of tasks.
-
39. The method of claim 21, further comprising displaying information regarding a performance of the categorizer.
-
40. The method of claim 39, wherein displaying the information regarding the performance of the categorizer comprises displaying at least one of a false positive rate, a true positive rate, a true negative rate, an accuracy measure, a recall measure, a precision measure, a binormal separation measure, an information gain measure, a lift measure, a stability under cross-validation measure, a measure for an area under a receiver operating characteristic curve, a number of training cases, a percentage of a target training size, an f-measure, a total cost, and an average cost.
-
26. (Cancelled).
-
34. (Cancelled).
-
36. (Cancelled).
-
41. An article comprising at least one storage medium containing instructions that when executed cause a computer to:
-
store a data set comprising a plurality of cases not labeled with respect to a category; receive a first query relating to at least the category; identify at least one case within the data set that matches the first query; receive one of a first indication that the identified at least one case belongs to the category, and a second indication that the identified at least one case does not belong to the category; and modify training information for training a categorizer in response to receiving one of the first indication and second indication. - View Dependent Claims (42, 43, 44, 45, 46)
add the at least one case to a baseline set; receive a second query; identify at least another case within the data set that matches the second query; and add the at least another case to the baseline set.
-
-
44. The article of claim 41, wherein the instructions when executed cause the computer to:
-
identify additional cases matching the query; display the cases matching the query, the query specifying a limit of the number of cases to display.
-
-
45. The article of claim 44, wherein the limit is based on an amount of information that is viewable without any further action from a user.
-
46. The article of claim 41, wherein the instructions when executed cause the computer to display a data item associated with the at least one case in at least one of a table and a graph.
-
47. A method comprising:
-
storing a data set comprising a plurality of cases not labeled with respect to a category; receiving a first query relating to at least the category; identifying at least one case within the data set that matches the first query; receiving one of a first indication that the identified at least one case belongs to the category, and a second indication that the identified at least one case does not belong to the category; and modifying training information for training a categorizer in response to receiving one of the first indication and second indication. - View Dependent Claims (48)
-
-
49. A system, comprising:
-
means for receiving a query relating to at least a first category to search cases stored in a data set; means for identifying a first group of cases in the data set matching the query; means for receiving indications of cases in the first group that belong to the first category; and means for modifying training information for training a categorizer in response to receiving the indications. - View Dependent Claims (50)
-
-
51. A system comprising:
-
a storage to store a data set comprising a plurality of cases; a display monitor to display a graphical user interface (GUI); a search engine to receive a query through the GUI and to identify cases within the data set that match the query; a confirmation module to receive first indications that some of the identified cases belong to a category, and second indications that others of the identified cases do not belong to the category, the first and second indications received through the GUI; a categorizer to categorize cases in the data set in the category; the storage to further store a positive training set of cases and negative training set of cases, the confirmation module to modify the positive training set of cases in response to receiving the first indications, and the confirmation module to modify the negative training set of cases in response to receiving the second indications; and a training module to modify the categorizer based on the positive and negative training sets.
-
Specification