Method for automatic deduction of rules for matching content to categories
First Claim
1. A method of classifying document content within a strange taxonomy, the strange taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the strange taxonomy, the method comprising the steps of:
- spidering the strange taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged, said strange taxonomy having an internal organizational structure that cannot be viewed by a user who is interacting with the strange taxonomy;
creating a rule generation document representing each of the at least one pairings;
parsing a second document according to the rule generation document; and
classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine.
1 Assignment
0 Petitions
Accused Products
Abstract
Accordingly, the invention is a method for automatic deduction of rules for matching document content to a category within a strange taxonomy, which allows the document to be automatically classified into a proper category for storage in that strange taxonomy. The method includes the steps of spidering the taxonomy to determine its structure and contents, extracting keywords from documents within the strange taxonomy, formulating rules for determining the category from the extracted keywords, and applying the rules to classify a new document whose keywords have been extracted. The taxonomy is strange because the user has no knowledge of its internal structure and needs no such knowledge. The taxonomy may be flat or may be hierarchal, the later having rules formulated at each level for proceeding to the next level. Variations for creating new and refurbishing old document management systems are disclosed.
-
Citations
20 Claims
-
1. A method of classifying document content within a strange taxonomy, the strange taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the strange taxonomy, the method comprising the steps of:
-
spidering the strange taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged, said strange taxonomy having an internal organizational structure that cannot be viewed by a user who is interacting with the strange taxonomy; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine. - View Dependent Claims (2, 3, 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
5. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
-
spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged;
creating a rule generation document representing each of the at least one pairings;parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a strange taxonomy and wherein the step of spidering the plurality of first documents tagged with at least one first category according to the taxonomy comprises the steps of; spidering the strange taxonomy with a first spider, the first spider adapted to the strange taxonomy being spidered; creating a third document using the first spider, the third document describing the strange taxonomy, the third document comprising a link to each of the first documents; and spidering the strange taxonomy with a second spider by spidering the third document created by the first spider, the second spider operable to access each of the first documents through the links in the third document, wherein the steps of spidering the strange taxonomy with the first spider and creating a third document comprise steps taken after the second document is classified into the taxonomy, the second document thereby becoming a first document within the plurality of first documents.
-
-
6. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
-
spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a strange taxonomy and wherein the step of spidering the plurality of first documents tagged with at least one first category according to the taxonomy comprises the steps of; spidering the strange taxonomy with a first spider, the first spider adapted to the strange taxonomy being spidered; creating a third document using the first spider, the third document describing the strange taxonomy, the third document comprising a link to each of the first documents; and spidering the strange taxonomy with a second spider by spidering the third document created by the first spider, the second spider operable to access each of the first documents through the links in the third document, wherein the step of spidering the strange taxonomy with a second spider comprises the step of spidering the strange taxonomy with a second spider after the second document is presented for classification within the taxonomy.
-
-
17. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
-
spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying the parsed second document into the particular first category comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a plurality of strange taxonomies, and further wherein; the step of creating a rule generation document comprises generating a single rule generation document for the plurality of strange taxonomies; and the step of classifying the parsed second document into at least one first category comprises the steps of; classifying the parsed second document into one strange taxonomy within the plurality of strange taxonomies; and classifying the parsed second document into one category within the plurality of categories within the strange taxonomy; the method operable to select one strange taxonomy among the plurality of strange taxonomies within which to classify the second document.
-
-
18. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
-
spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying the parsed second document into the particular first category comprising submitting the parsed second document to a classification engine, wherein the taxonomy comprises a hierarchy of strange taxonomies, and further wherein; the step of creating a rule generation document comprises at least one of; generating at least one rule within the rule generation document for each strange taxonomy within the hierarchy of strange taxonomies; and creating a rule generation document for each level of the hierarchy of strange taxonomies; and the step of classifying the parsed second document into at least one first category comprises the steps of; classifying the parsed second document into at least one strange taxonomy within the hierarchy of strange taxonomies; and classifying the parsed second document into at least one first category within the at least one strange taxonomy within the hierarchy of strange taxonomies.
-
-
19. A method of classifying document content within a taxonomy, the taxonomy comprising a plurality of first categories in a computer document storage organizational scheme and a plurality of first documents, each first document tagged with at least one first category according to the taxonomy, the method comprising the steps of:
-
spidering the taxonomy to generate at least one pairing of each first document with each first category with which the each first document is tagged; creating a rule generation document representing each of the at least one pairings; parsing a second document according to the rule generation document; and classifying the parsed second document into a particular first category, said classifying the parsed second document into the particular first category comprising submitting the parsed second document to a classification engine, wherein the rule generation document comprises rules for mapping from at least one of a keyword and a pattern of keywords to one or more first categories, and wherein the step of parsing the second document according to the rule generation document comprises the steps of; finding no keywords in the parsed second document similar to keywords in the rule generation document; creating a new category within the taxonomy; and classifying the second document in the new category.
-
-
20. A method for categorizing the content of a new document within a strange taxonomy, the strange taxonomy comprising a plurality of first categories and a plurality of first documents within at least one of the first categories, wherein a root node for the strange taxonomy has been provided, the plurality of first documents being stored on a computer-readable strorage device, the method being implemented through execution of computer readable program code by a processor of a computer system, said computer readable program code being stored on a computer usable medium, the method comprising the steps of:
-
automatically spidering the strange taxonomy to identify each first category and each document among the plurality of first documents classified within each respective first category; automatically forming pairs for each of the first documents, each pair comprising one of the first documents and the category within which the one of the first documents is classified; automatically extracting at least one of a keyword and a pattern of keywords from each of the first documents in each of the first categories; automatically associating at least one of a keyword and a pattern of keywords extracted from each of the first documents within each of the first categories with the first category in which the first documents are classified; automatically generating rules, each rule mapping at least one of a keyword and patterns of keywords to the first category in which the first documents containing the at least one of a keywords and a pattern of keywords are classified; automatically parsing an unclassified document to determine new keywords therein; and automatically classifying the unclassified document into at least one of a new category and a first category having documents containing at least one of keywords and patterns of keywords similar to the new keywords.
-
Specification