Models for classifying documents
First Claim
1. A method for defining a content relevance model for a particular category, the content relevance model for determining whether a content segment is relevant to the particular category, the method comprising:
- receiving a first set of content segments that contain content previously determined to be relevant to the particular category and a second set of content segments that contain content previously determined to be not relevant to the particular category;
identifying a set of key word sets that appear more frequently in the first set of content segments than the second set of content segments; and
defining a content relevance model that comprises a set of groups of word sets and a score for each group, each of the groups of word sets comprising a key word set from the identified set of key word sets and at least one word set found in a context of the key word set in at least one of the received content segments, the content relevance model for scoring new content segments that are different from the content segments of the first and second sets of content segments, in order to determine relevance of the new content segments to the particular category.
5 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide a method for defining a content relevance model for determining whether a content segment is relevant to a particular category. The method receives a first set of content segments that contain content relevant to the particular category and a second set of content segments that contain content not relevant to the particular category. The method identifies a set of key word sets more likely to appear in the first set of content segments than the second set of content segments. The method defines a content relevance model that comprises a set of groups of word sets and a score for each group, each of the groups of word sets comprising a key word set from the set of key word sets and at least one word set found in a context of the key word set in at least one of the received content segments.
96 Citations
23 Claims
-
1. A method for defining a content relevance model for a particular category, the content relevance model for determining whether a content segment is relevant to the particular category, the method comprising:
-
receiving a first set of content segments that contain content previously determined to be relevant to the particular category and a second set of content segments that contain content previously determined to be not relevant to the particular category; identifying a set of key word sets that appear more frequently in the first set of content segments than the second set of content segments; and defining a content relevance model that comprises a set of groups of word sets and a score for each group, each of the groups of word sets comprising a key word set from the identified set of key word sets and at least one word set found in a context of the key word set in at least one of the received content segments, the content relevance model for scoring new content segments that are different from the content segments of the first and second sets of content segments, in order to determine relevance of the new content segments to the particular category. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for defining a content relevance model for a particular category, the method comprising:
-
identifying a set of key word sets for the particular category based on an analysis of (i) a first set of content segments previously defined as relevant to the particular category and (ii) a second set of content segments previously defined as not relevant to the particular category; identifying (i) a set of pairs of word sets that each comprise a key word set and a word set that appears in a defined context of the key word set and (ii) a score for each of the word set pairs, the score for a particular word set pair quantifying a likelihood that a content segment containing the particular word set pair is relevant to the particular category, wherein appearances of the particular word set pair in the first set of content segments increase the score for the particular word set pair and appearances of the particular word set pair in the second set of content segments decrease the score for the particular word set pair; and defining a content relevance model for the particular category, the content relevance model comprising (i) a context definition that indicates when a second word set appears within a context of a key word set and (ii) the set of word set pairs and corresponding scores. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer readable medium storing an application which when executed by at least one processor defines a content relevance model for a particular category, the application comprising:
-
a keyword generator for identifying a set of key word sets for the particular category based on an analysis of (i) a first set of content segments previously defined as relevant to the particular category and (ii) a second set of content segments previously defined as relevant to a set of categories related to the particular category, but not relevant to the particular category, the set of key word sets comprising word sets that appear with greater frequency in the first set of content segments than in the second set of content segments; and a word pair generator for; identifying a set of pairs of word sets, each word set pair comprising a key word set identified by the keyword generator and a word set that appears within a defined context of the key word set in at least one content segment from the first and second sets of content segments; and determining scores for each of the word set pairs by comparing a first number of occurrences of the word set pair in the first set of content segments with a second number of occurrences of the word set pair in the second set of content segments, wherein the score for a particular word set pair is for use in determining the relevancy to the particular category of a new content segment that contains the particular word set pair. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification