CONTEXT-BASED METADATA GENERATION AND AUTOMATIC ANNOTATION OF ELECTRONIC MEDIA IN A COMPUTER NETWORK
First Claim
1. A method comprising:
- generating, by a computing device and based at least in part on information about an input item, a context of the input item;
comparing, by the computing device, the context of the input item to respective contexts of a plurality of other items to determine respective levels of similarity between the input item and each of the plurality of other items; and
annotating the input item with information derived from at least one of the plurality of other items based at least in part on the respective levels of similarity between the input item and each of the plurality of other items.
2 Assignments
0 Petitions
Accused Products
Abstract
Computerized systems for automating content annotation (e.g., tag creation and/or expansion) for low-content items within a computer network by leveraging intelligence of other data sources within a network to generate secondary content (e.g., a “context”) for items (e.g., documents) for use in a tagging process. For example, based on user assigned tags for an item, secondary content information can be generated and used to determine a new list of candidate tags for the item. Additionally, the context of an input item may be compared against the respective contexts of a plurality of other items to determine respective levels of similarity between the input item and each of the plurality of other items in order to annotate the input item. Techniques involving web-distance based clustering and leveraging crowd-sourced information sources to remove noisy data from annotated results are also described.
37 Citations
31 Claims
-
1. A method comprising:
-
generating, by a computing device and based at least in part on information about an input item, a context of the input item; comparing, by the computing device, the context of the input item to respective contexts of a plurality of other items to determine respective levels of similarity between the input item and each of the plurality of other items; and annotating the input item with information derived from at least one of the plurality of other items based at least in part on the respective levels of similarity between the input item and each of the plurality of other items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method comprising:
-
identifying, by a computing device, one or more external data sources; retrieving, by the computing device, topics from the identified one or more external data sources; aggregating, by the computing device, the topics into a topic database; generalizing, by the computing device, the topic database by performing a normalization process; conducting a query, by the computing device in a publicly available search engine, for each topic in the generalized topic database; constructing, by the computing device and based at least in part on the results of the query, a topic context database; identifying, by the computing device, information about one or more input items, wherein the information about the input item comprises a title; conducting a query, by the computing device in a publicly available search engine, for content in the title for the one or more input items; constructing, by the computing device and based at least in part on the results of the query, a respective title context for the one or more input items; comparing, by the computing device, the title context and the one or more topics in the topic context database using a text similarity computation model; determine, by the computing device and based at least in part on the results of the comparison, one or more topics from the topic context database with which to annotate the one or more input items; annotating, by the computing device, the one or more input items based on the determined topics. - View Dependent Claims (24)
-
-
25. A method comprising:
-
identifying, by a computing device, tags that have been previously assigned to one or more input items; generating, by a computing device, a database of secondary content based on the identified tags; generating, by a computing device and based on the contents of the secondary content database, a list of candidate new tags; remove, by a computing device, noisy tags from the list; determine, by a computing device, a final list of tags with which to annotate the one or more input items. - View Dependent Claims (26)
-
-
27. A computing device having a processor configured to:
-
generate, based at least in part on information about an input item, a context of the input item; compare the context of the input item to respective contexts of a plurality of other items to determine respective levels of similarity between the input item and each of the plurality of other items; and annotate the input item with information derived from at least one of the plurality of other items based at least in part on the respective levels of similarity between the input item and each of the plurality of other items. - View Dependent Claims (28, 29, 30)
-
-
31. A computer-readable storage medium encoded with instructions that, when executed, cause at least one processor to:
-
generate, based at least in part on information about an input item, a context of the input item; compare the context of the input item to respective contexts of a plurality of other items to determine respective levels of similarity between the input item and each of the plurality of other items; and annotate the input item with information derived from at least one of the plurality of other items based at least in part on the respective levels of similarity between the input item and each of the plurality of other items.
-
Specification