MODEL-BASED IDENTIFICATION OF RELEVANT CONTENT
First Claim
1. A method, comprising:
- obtaining validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics;
using the validated training data to produce, by one or more computer systems, a statistical model for classifying a relevance of content to the one or more topics;
using the statistical model to generate, by the one or more computer systems, a second set of relevance tags for a second set of content items; and
outputting, by the one or more computer systems, one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.
2 Assignments
0 Petitions
Accused Products
Abstract
The disclosed embodiments provide a system for processing data. During operation, the system obtains validated training data containing a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics. Next, the system uses the validated training data to produce a statistical model for classifying a relevance of content to the one or more topics. The system then uses the statistical model to generate a second set of relevance tags for a second set of content items. Finally, the system outputs one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items.
-
Citations
20 Claims
-
1. A method, comprising:
-
obtaining validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics; using the validated training data to produce, by one or more computer systems, a statistical model for classifying a relevance of content to the one or more topics; using the statistical model to generate, by the one or more computer systems, a second set of relevance tags for a second set of content items; and outputting, by the one or more computer systems, one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus, comprising:
-
one or more processors; and memory storing instructions that, when executed by the one or more processors, cause the apparatus to; obtain validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics; use the validated training data to produce a statistical model for classifying a relevance of content to the one or more topics; use the statistical model to generate a second set of relevance tags for a second set of content items; and output one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A system, comprising:
-
an analysis non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the system to; obtain validated training data comprising a first set of content items and a first set of relevance tags, wherein the first set of relevance tags is used by one or more domain experts to identify the first set of content items as relevant to one or more topics; use the validated training data to produce a statistical model for classifying a relevance of content to the one or more topics; and use the statistical model to generate a second set of relevance tags for a second set of content items; and a management non-transitory computer-readable medium comprising instructions that, when executed by the one or more processors, cause the system to output one or more groupings of the second set of content items by the second set of relevance tags to improve understanding of content related to the one or more topics without requiring a user to manually analyze the second set of content items. - View Dependent Claims (20)
-
Specification