Disambiguation of themes in a document classification system
First Claim
1. A computer-implemented method for validating a category classified for a theme in a document classification system, said method comprising the steps of:
- processing at least one document by generating a thematic context output that indicates applicability of a plurality of thematic constructions, and by generating a plurality of themes from said thematic context output, wherein said themes define the overall content of said document;
storing a classification hierarchy that includes a plurality of categories;
receiving a plurality of categories that preliminarily classify said themes for said document;
selecting a theme from a document to validate a category preliminarily classified for said theme; and
determining whether said category preliminarily classified for said theme selected is valid by analyzing relationships among said category preliminarily classified for said theme and other categories classified for different themes in said document.
2 Assignments
0 Petitions
Accused Products
Abstract
A document classification system includes disambiguation processing to validate categories that have been preliminarily classified for themes of a document. The themes of a document are preliminarily classified through use of a classification hierarchy that contains a plurality of categories. The disambiguation processing determines, for a theme selected for disambiguation, whether the category preliminarily classified for the theme selected is valid by analyzing the relationships among the category preliminarily classified for the theme and other categories classified for different themes in the document. The disambiguation processing also utilizes a category cross reference database, which comprises a list of category cross reference pairs, to disambiguate categories assigned to themes by pairing a category classified for a theme and other categories classified for other themes in the document and by comparing these category pairs with category cross reference database pairs. If a match occurs, then the categories of a document category pair are validated.
-
Citations
17 Claims
-
1. A computer-implemented method for validating a category classified for a theme in a document classification system, said method comprising the steps of:
-
processing at least one document by generating a thematic context output that indicates applicability of a plurality of thematic constructions, and by generating a plurality of themes from said thematic context output, wherein said themes define the overall content of said document; storing a classification hierarchy that includes a plurality of categories; receiving a plurality of categories that preliminarily classify said themes for said document; selecting a theme from a document to validate a category preliminarily classified for said theme; and determining whether said category preliminarily classified for said theme selected is valid by analyzing relationships among said category preliminarily classified for said theme and other categories classified for different themes in said document. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method for validating a category preliminarily classified for theme in a document classification system, said method comprising the steps of:
-
processing at least one document by generating a thematic context output that indicates applicability of a plurality of thematic constructions, and by generating a plurality of themes from said thematic context output, wherein said themes define the overall content of said document; receiving a preliminary classification of categories for said themes in said document; storing a category cross reference database that comprises a list of category cross reference pairs, wherein categories of a category pair have a semantic, linguistic, or use association; selecting a theme from said document to validate a category preliminarily classified for said theme; generating document category pairs by combining a category classified for said theme and each category classified for other themes in said document; comparing said document category pairs with category cross reference database pairs to determine if a match occurs; and validating categories of a document category pair if a match between said document category pair and said category cross reference database pairs occurs. - View Dependent Claims (6, 7)
-
-
8. A computer-implemented method for classifying themes in a document, said method comprising the steps of:
-
determining whether themes in said document are completely ambiguous; assigning a category for each theme to preliminarily classify each theme under said category if a theme is not completely ambiguous; and determining, for themes assigned a category, whether a category classified for a theme is valid by analyzing other categories classified for other themes of said document.
-
-
9. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causes the computer to perform the steps of:
-
processing at least one document by generating a thematic context output that indicates applicability of a plurality of thematic constructions, and by generating a plurality of themes from said thematic context output, wherein said themes define the overall content of said document; storing a classification hierarchy that includes a plurality of categories; receiving a plurality of categories that preliminarily classify said themes for said document; selecting a theme from a document to validate a category preliminarily classified for said theme; and determining whether said category preliminarily classified for said theme selected is valid by analyzing relationships among said category preliminarily classified for said theme and other categories classified for different themes in said document. - View Dependent Claims (10, 11, 12)
-
-
13. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causes the computer to perform the steps of:
-
processing at least one document by generating a thematic context output that indicates applicability of a plurality of thematic constructions, and by generating a plurality of themes from said thematic context output, wherein said themes define the overall content of said document; receiving a preliminary classification of categories for said themes in said document; storing a category cross reference database that comprises a list of category cross reference pairs, wherein categories of a category pair have a semantic, linguistic, or use association; selecting a theme from said document to validate a category preliminarily classified for said theme; generating document category pairs by combining a category classified for said theme and each category classified for other themes in said document; comparing said document category pairs with category cross reference database pairs to determine if a match occurs; and validating categories of a document category pair if a match between said document category pair and said category cross reference database pairs occurs. - View Dependent Claims (14, 15)
-
-
16. A computer readable medium comprising a plurality of instructions, which when executed by a computer, causes the computer to perform the steps of:
-
receiving a document comprising a plurality of themes; determining whether said themes in a document are completely ambiguous; assigning a category for each theme to preliminarily classify each theme under said category if a theme is not completely ambiguous; and determining, for themes assigned a category, whether a category classified for a theme is valid by analyzing other categories classified for other themes of said document.
-
-
17. A computer system comprising:
-
memory for storing a plurality of themes for a document; processor unit for determining whether said themes in said document are completely ambiguous, for assigning a category for each theme to preliminarily classify each theme under said category if a theme is not completely ambiguous, and for determining, for themes assigned a category, whether a category classified for a theme is valid by analyzing other categories classified for other themes of said document.
-
Specification