Method for automatic categorization of items
First Claim
1. A method of machine learning to automatically categorize items from a plurality of pre-categorized items having fields, wherein the fields include one or more text fields having terms and one or numeric fields having values, including the steps:
- (a) counting a frequency of usage of terms by category for one or more text fields in a plurality of pre-categorized items, wherein a term is a single word or both single words and phrases;
(b) weighting the frequency of usage of a particular term in a particular category based on the frequency of usage of said particular term in other categories; and
(c) determining a distribution by category for values in one or more numeric fields of said pre-categorized items.
8 Assignments
0 Petitions
Accused Products
Abstract
A system and method for automatic categorization of items into categories. Machine learning establishes or updates a data structure including term weights for text fields and distributions for numeric fields, based on a sample of pre-categorized items. An automatic categorization engine processes items by referencing the data structure on a field-by-field basis, determining a ranking score for each alternative category to which an item may be assigned. A category assignment may be based on ranking scores and may be flagged for a user to review. A user interface facilitates review and confirmation of automatic category assignments, either comprehensively, as flagged by the automatic categorization engine, or according to user determined criteria.
-
Citations
68 Claims
-
1. A method of machine learning to automatically categorize items from a plurality of pre-categorized items having fields, wherein the fields include one or more text fields having terms and one or numeric fields having values, including the steps:
-
(a) counting a frequency of usage of terms by category for one or more text fields in a plurality of pre-categorized items, wherein a term is a single word or both single words and phrases;
(b) weighting the frequency of usage of a particular term in a particular category based on the frequency of usage of said particular term in other categories; and
(c) determining a distribution by category for values in one or more numeric fields of said pre-categorized items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
wherein the determining step precedes the filtering step which precedes the counting step. -
11. The method of claim 10, wherein the filtering step eliminates outliers that are more than a predetermined number of standard deviations from the mean value for a numeric field.
-
12. The method of claim 10, wherein the filtering step eliminates predetermined percentiles of highest and lowest outliers.
-
-
13. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
further including the step of normalizing ranking cores based on the number of parsed terms.
-
-
14. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
further including the step of selecting one or more categories based on the adjusted ranking scores. - View Dependent Claims (15)
-
-
16. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
further including the step of rank ordering categories based on the adjusted ranking scores.
-
-
17. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
wherein calculating ranking scores for an identified category includes summing the weighted frequencies for the parsed terms;
normalizing the sum of the weighted frequencies based on the number of parsed terms in the uncategorized item.
-
-
18. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories and (d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
wherein calculating ranking scores for an identified category includes summing by text field the weighted frequencies for the parsed terms;
combining the sums of weighted frequencies by text field according to a predetermined weighting formula;
normalizing the combined sum of weighted frequencies. - View Dependent Claims (19)
-
-
20. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
wherein the adjusting step includes applying a multiplicative factor to said ranking scores.
-
-
21. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
wherein the adjusting step includes applying an additive factor to said ranking scores.
-
-
22. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying categories associated with the terms;
(c) calculating ranking scores for the terms in the identified categories; and
(d) adjusting said ranking scores corresponding to a comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
wherein the adjusting step includes applying a decision rule to said ranking scores.
-
-
23. A method of automatically categorizing an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) determining for each text field the applicable ranking ideas;
(c) identifying categories associated with the terms for the applicable ranking ideas;
(d) calculating ranking scores for the terms for the applicable ranking ideas in the identified categories; and
(e) adjusting said ranking scores corresponding to comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
further including the step of selecting one or more categories based on the adjusted ranking scores. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31)
summing weighted frequencies for the parsed terms; normalizing the sum of the weighted frequencies based on the number of parsed terms in the uncategorized item.
-
-
27. The method of claim 23, wherein calculating ranking scores for an identified category includes
summing weighted frequencies for the parsed terms; -
combining the sums of weighted frequencies according to a predetermined weighting formula;
normalizing the combined sum of weighted frequencies.
-
-
28. The method of claim 23, wherein the predetermined weighting formula assigns a greater weight to a ranking field containing a short description of the uncategorized item than a text field containing a long description of the uncategorized item.
-
29. The method of claim 23, wherein the adjusting step includes applying a multiplicative factor to said ranking scores.
-
30. The method of claim 23, wherein the adjusting step includes applying an additive factor to said ranking scores.
-
31. The method of claim 23, wherein the adjusting step includes applying a decision rule to said ranking scores.
-
32. A method of automatically categorizing an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, and category by category data is available for a frequency of term usage and a distribution of values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) determining for each text field the applicable ranking ideas;
(c) identifying categories associated with the terms for the applicable ranking ideas;
(d) calculating ranking scores for the terms for the applicable ranking ideas in the identified categories; and
(e) adjusting said ranking scores corresponding to comparison of values in one or more numeric fields of the uncategorized item with corresponding distributions of numeric values;
further including the step of rank ordering categories based on the adjusted ranking scores.
-
-
33. A method of machine learning to automatically categorize items from a plurality of pre-categorized items having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) counting a frequency of usage of terms in one or more text fields in a plurality of pre-categorized items wherein a term is a single word or both single words and phrases;
(b) weighting the frequency of usage of a particular term in a particular item based on the frequency of usage of said particular term in other items; and
(c) determining a distribution by category for values in one or more numeric fields of said pre-categorized items. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
wherein the determining step precedes the filtering step which precedes the counting step. -
43. The method of claim 42, wherein the filtering step eliminates outliers that are more than a predetermined number of standard deviations from the mean value for a numeric field.
-
44. The method of claim 42, wherein the filtering step eliminates predetermined percentiles of highest and lowest outliers.
-
-
45. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item wherein the adjusting step further includes comparing the values in one or more numeric fields of the uncategorized item with corresponding values in numeric fields of the identified items.
-
-
46. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein category by category data is available for one or more distributions of values and the adjusting step compares the values in one or more numeric fields of the uncategorized item with the distributions of values for categories corresponding to the identified items.
-
-
47. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
further including the step of normalizing ranking scores based on the number of parsed terms.
-
-
48. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
further including the step of selecting one or more categories based on the adjusted ranking scores. - View Dependent Claims (49)
-
-
50. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
further including the step of rank ordering categories based on the adjusted ranking scores.
-
-
51. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein calculating ranking scores for an identified category includes summing the weighted frequencies for the parsed terms;
normalizing the sum of the weighted frequencies based on the number of parsed terms in the uncategorized item.
-
-
52. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein calculating ranking scores for an identified category includes summing by text field the weighted frequencies for the parsed terms;
combining the sums of weighted frequencies by text field according to a predetermined weighting formula;
normalizing the combined sum of weighted frequencies. - View Dependent Claims (53)
-
-
54. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein the adjusting step includes applying a multiplicative factor to said ranking scores.
-
-
55. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein the adjusting step includes applying an additive factor to said ranking scores.
-
-
56. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or more numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) identifying items associated with the terms;
(c) calculating ranking scores for the terms in the identified items; and
(d) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein the adjusting step includes applying a decision rule to said ranking scores.
-
-
57. A method of ranking for automatic categorization of an item having fields, wherein the fields include one or more text fields having terms and one or numeric fields having values, including the steps:
-
(a) parsing terms in one or more text fields of an uncategorized item;
(b) determining for each text field the applicable categorization ideas;
(c) identifying items associated with the terms for the applicable categorization ideas;
(d) calculating ranking scores for the terms for the applicable categorization ideas in the identified items; and
(e) adjusting said ranking scores based on values in one or more numeric fields of the uncategorized item;
wherein the adjusting step further includes comparing the values in one or more numeric fields of the uncategorized item with corresponding values in numeric fields of the identified items. - View Dependent Claims (58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
summing weighted frequencies for the parsed terms; normalizing the sum of the weighted frequencies based on the number of parsed terms in the uncategorized item.
-
-
64. The method of claim 57, wherein calculating ranking scores for an identified category includes
summing weighted frequencies for the parsed terms; -
combining the sums of weighted frequencies according to a predetermined weighting formula;
normalizing the combined sum of weighted frequencies.
-
-
65. The method of claim 64, wherein the predetermined weighting formula assigns a greater weight to a ranking field containing a short description of the uncategorized item than a text field containing a long description of the uncategorized item.
-
66. The method of claim 57, wherein the adjusting step includes applying a multiplicative factor to said ranking scores.
-
67. The method of claim 57, wherein the adjusting step includes applying an additive factor to said ranking scores.
-
68. The method of claim 57, wherein the adjusting step includes applying a decision rule to said ranking scores.
Specification