Method for automatic deduction of rules for matching content to categories
First Claim
1. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
- spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged;
creating a rule generation document representing each of the at least one pairings;
parsing a second document according to the rule generation document; and
classifying the parsed second document into a particular first category.
1 Assignment
0 Petitions
Accused Products
Abstract
Accordingly, the invention is a method for automatic deduction of rules for matching document content to a category within a strange taxonomy, which allows the document to be automatically classified into a proper category for storage in that strange taxonomy. The method includes the steps of spidering the taxonomy to determine its structure and contents, extracting keywords from documents within the strange taxonomy, formulating rules for determining the category from the extracted keywords, and applying the rules to classify a new document whose keywords have been extracted. The taxonomy is strange because the user has no knowledge of its internal structure and needs no such knowledge. The taxonomy may be flat or may be hierarchal, the later having rules formulated at each level for proceeding to the next level. Variations for creating new and refurbishing old document management systems are disclosed.
-
Citations
28 Claims
-
1. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
-
spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged;
creating a rule generation document representing each of the at least one pairings;
parsing a second document according to the rule generation document; and
classifying the parsed second document into a particular first category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 23)
-
-
21. A method for categorizing the content of a new document within a strange taxonomy, the strange taxonomy comprising a plurality of first categories and a plurality of first documents within at least one of the first categories, wherein a root node for the strange taxonomy has been provided, the method comprising the steps of:
-
automatically spidering the strange taxonomy to identify each first category and each document among the plurality of first document classified within each respective first category;
automatically forming pairs for each of the first documents, each pair comprising one of the first documents and the category within which the one of the first documents is classified;
automatically extracting at least one of a keyword and a pattern of keywords from each of the first documents in each of the first categories;
automatically associating at least one of a keyword and a pattern of keywords extracted from each of the first documents within each of the first categories with the first category in which the first documents are classified;
automatically generating rules, each rule mapping at least one of a keyword and patterns of keywords to the first category in which the first documents containing the at least one of a keywords and a pattern of keywords are classified;
automatically parsing an unclassified document to determine new keywords therein; and
automatically classifying the unclassified document into at least one of a new category and a first category having documents containing at least one of keywords and patterns of keywords similar to the new keywords.
-
- 22. A method of storing a new document according to a strange taxonomy, the method comprising the step of providing the new document and a starting point in the strange taxonomy to a rule-deducing document classification and storage computer program, the program automatically spidering the strange taxonomy and tagged documents classified therein, automatically deducing rules for classification of the new document, automatically classifying the new document according to the rules deduced, and automatically storing the new document according to the classification of the new document.
-
24. An apparatus comprising:
-
at least one processor;
a memory coupled to the at least one processor;
computer-readable data storage media coupled to the at least one processor;
a plurality of documents tagged according to a taxonomy, the documents residing in the computer readable data storage media, the documents comprising content, the content comprising at least one of a keyword and a pattern of keywords; and
a rule-deducing content classification mechanism residing in memory.
-
-
26. A program product comprising:
-
a rule-deducing classification mechanism residing in memory, the rule-deducing classification mechanism operable to automatically spider a strange taxonomy and the tagged documents classified therein, to deduce rules for classifying documents within the strange taxonomy, and to classify a new document according to the strange taxonomy; and
computer-readable signal bearing media bearing the rule-deducing classification mechanism.
-
-
27. A method of classifying document content within at least one taxonomy, the at least one taxonomy comprising a plurality of first categories in a computer document storage taxonomy and at least one first document tagged according to the taxonomy, the at least one first document classified within at least one first category within the plurality of first categories of the at least one taxonomy, the method comprising the steps of:
-
at least one of spidering and crawling the at least one taxonomy and the at least one first document tagged according to the at least one taxonomy to generate at least one pairing of at least one first document with the at least one first category in which the at least one first document is classified within the at least one taxonomy;
creating a rule generation document representing the at least one pairing of at least one first document with the at least one first category;
parsing a second document according to the rule generation document; and
classifying the parsed second document into at least one first category in the at least one taxonomy.
-
-
28. A method of finding documents in a computerized document management system, wherein the lost documents are lost to search engines because of incorrect filing, the method comprising the steps of:
-
retrieving each document; and
saving each document under a new taxonomical root using a rule-deducing classification mechanism.
-
Specification