URL-based content categorization
First Claim
Patent Images
1. A computer implemented method of categorizing content, the method comprising:
- accessing a uniform resource locator associated with content requested by a user;
processing the uniform resource locator to generate a set of multiple n-grams contained in the uniform resource locator, wherein the processing includes using a plurality of different n values to generate the set of multiple n-grams, the set including n-grams of different character lengths;
providing the set of multiple n-grams to a classifier for determination of a classification score based on the set of multiple n-grams, wherein each of the n-grams is weighted according to the formula;
6 Assignments
0 Petitions
Accused Products
Abstract
Content may be categorized by accessing a URL associated with the content, determining a set of n-grams contained in the URL, and determining a category of the content based on the set of n-grams.
-
Citations
31 Claims
-
1. A computer implemented method of categorizing content, the method comprising:
-
accessing a uniform resource locator associated with content requested by a user; processing the uniform resource locator to generate a set of multiple n-grams contained in the uniform resource locator, wherein the processing includes using a plurality of different n values to generate the set of multiple n-grams, the set including n-grams of different character lengths; providing the set of multiple n-grams to a classifier for determination of a classification score based on the set of multiple n-grams, wherein each of the n-grams is weighted according to the formula; - View Dependent Claims (2)
-
-
3. A computer implemented method of categorizing content, the method comprising:
-
accessing, from computer storage, a uniform resource locator associated with content; determining a set of n-grams contained in the uniform resource locator, wherein a plurality of different n values are used to determine the set of n-grams, the set of n-grams including n-grams of different character lengths; determining a category of the content based on the set of n-grams, wherein each of the n-grams is weighted according to the formula; - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for categorizing content, the system comprising:
-
at least one processing device; a storage storing instructions for causing the at least one processing device to implement; an n-gram generator configured to access a uniform resource locator (URL) associated with content and determine a set of n-grams contained in the uniform resource locator; a classifier configured to determine a URL score based on the set of n-grams, wherein each of the n-grams is weighted according to the formula; - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A system for categorizing content, the system comprising:
-
at least one processing device; a storage storing instructions for causing the at least one processing device to implement; a URL-based classifier configured to produce a first score based on a set of n-grams contained in a uniform resource locator associated with content, wherein the set of n-grams is based on multiple different n-values and includes n-grams of different character lengths, wherein each of the n-grams is weighted according to the formula; - View Dependent Claims (23, 24)
-
- 22. The system of 21 wherein, to determine a category of the content based on both the first score and the second score, the integration engine is configured to generate a sum of the first score and the second score and compare the sum to a threshold.
-
27. A computer implemented method of categorizing content, the method comprising:
-
accessing a uniform resource locator (URL) associated with the content;
processing the uniform resource locator to generate a set of multiple n-grams contained in the uniform resource locator, wherein a plurality of different n values are used to process the uniform resource locator, and the set of multiple n-grams includes n-grams of different character lengths;providing the set of multiple n-grams to a URL classifier, wherein the URL classifier is configured to determine a first classification score based on the set of multiple n-grams, wherein each of the n-grams is weighted according to the formula; - View Dependent Claims (28, 29, 30, 31)
-
Specification