CONTEXTUAL ANALYSIS ENGINE
First Claim
1. A method of analyzing digital content, the method comprising:
- receiving a corpus of text;
extracting a plurality of n-grams from the corpus of text;
constructing a multi-dimensional document feature vector based on the plurality of n-grams;
categorizing a plurality of topics derived from the multi-dimensional document feature vector based on a topic ontology; and
generating a hierarchical listing of the plurality of categorized topics that includes a relevancy ranking for at least a portion of the topics included within the hierarchical listing.
2 Assignments
0 Petitions
Accused Products
Abstract
A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.
87 Citations
24 Claims
-
1. A method of analyzing digital content, the method comprising:
-
receiving a corpus of text; extracting a plurality of n-grams from the corpus of text; constructing a multi-dimensional document feature vector based on the plurality of n-grams; categorizing a plurality of topics derived from the multi-dimensional document feature vector based on a topic ontology; and generating a hierarchical listing of the plurality of categorized topics that includes a relevancy ranking for at least a portion of the topics included within the hierarchical listing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of delivering targeted content to a consumer, the method comprising:
-
generating contextual analysis data for a plurality of webpages that are made available to a consumer via a website, wherein the contextual analysis data includes a hierarchical listing of topics with corresponding relevancy rankings for each of the plurality of webpages; compiling website visitor log information indicating which of the plurality of webpages have been visited by the consumer; analyzing the contextual analysis data corresponding to the webpages visited by the consumer to identify at least one topic as being relevant to the consumer; and identifying targeted content to be delivered to the consumer, wherein the targeted content is related to the at least one relevant topic. - View Dependent Claims (13, 14, 15)
-
-
16. A system for analyzing digital content, the system comprising:
-
an n-gram extractor configured to extract a plurality of n-grams from an unstructured corpus of text; a topic model generator configured to construct a multi-dimensional feature vector based on the plurality of n-grams; a topic categorizer configured to categorize a plurality of topics derived from the multi-dimensional document feature vector based on a topic ontology, wherein at least one of the plurality of categorized topics is not included within the plurality of n-grams extracted from the unstructured corpus of text. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A non-transient computer readable medium having instructions encoded thereon that, when executed by one or more processors, causes a digital content analysis process to be carried out, the process comprising:
-
receiving an unstructured corpus of text; extracting a plurality of n-grams from the unstructured corpus of text; constructing a multi-dimensional document feature vector based on the plurality of n-grams; categorizing a plurality of topics derived from the multi-dimensional document feature vector based on a topic ontology; and generating a hierarchical listing of the plurality of categorized topics that includes a relevancy ranking for at least a portion of the topics included within the hierarchical listing. - View Dependent Claims (22, 23, 24)
-
Specification