SYSTEM AND METHOD FOR CLASSIFYING TAGS OF CONTENT USING A HYPERLINKED CORPUS OF CLASSIFIED WEB PAGES
First Claim
1. A computer system for classifying a tag associated with content, comprising:
- a tag classification engine for classifying a tag associated with content of a web document with a category associated with one or more documents in a classified corpus of hyperlinked web documents referred by one or more anchor texts matching the text of the tag; and
a storage operably coupled to the tag classification engine for storing a plurality of categories of tags classified with the category associated with one or more documents in the classified corpus of hyperlinked web documents referred by the one or more anchor texts matching the text of the tag.
9 Assignments
0 Petitions
Accused Products
Abstract
An improved system and method for classifying tags of content using a hyperlinked corpus of classified web pages is provided. An anchor text index may be searched to find anchor texts that may match text of the tag, documents referenced by the matching anchor texts may be found, and the documents referenced by the matching anchor texts may be grouped to disambiguate multiple classifications that result from matching the anchor texts with the categories of the reference documents. To resolve ambiguity between multiple classifications, weighted classifications may be used where each document may be assigned a positive weight for a mapping to a category to indicate the confidence of the classification of the document to the category. The classification for the grouping of the documents referenced by the matching anchor texts with greatest frequency may be selected and output as the classification for the tag.
39 Citations
20 Claims
-
1. A computer system for classifying a tag associated with content, comprising:
-
a tag classification engine for classifying a tag associated with content of a web document with a category associated with one or more documents in a classified corpus of hyperlinked web documents referred by one or more anchor texts matching the text of the tag; and a storage operably coupled to the tag classification engine for storing a plurality of categories of tags classified with the category associated with one or more documents in the classified corpus of hyperlinked web documents referred by the one or more anchor texts matching the text of the tag. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for classifying a tag associated with content, comprising:
-
matching text of a tag associated with content of a web document with one or more anchor texts in a classified corpus of hyperlinked web documents; finding one or more documents referenced by the one or more anchor texts in the classified corpus of hyperlinked web documents; grouping the one or more documents by one or more classifications; selecting a classification associated with the grouping of the one or more documents referenced by the one or more anchor texts; and outputting the classification for the tag associated with the content of the web document. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer system for classifying a tag associated with content, comprising:
-
means for matching text of a tag associated with content of a web document with one or more anchor texts in a classified corpus of hyperlinked web documents; means for matching the one or more anchor texts with one or more categories of one or more documents in the classified corpus of hyperlinked web documents; and means for outputting at least one classification for the tag associated with the content of the web document. - View Dependent Claims (18, 19, 20)
-
Specification