Input/output interface for contextual analysis engine
First Claim
1. A method of analyzing digital content to generate contextual analysis data based on the digital content, the method comprising:
- receiving a request to analyze the digital content;
invoking a text extraction service configured to extract a corpus of plain text from the digital content;
receiving, from a first text analytics service, a plurality of topics extracted from a topic ontology, each of the topics having associated therewith a relevancy score, wherein at least one of the topics is not included in the corpus of plain text;
receiving, from a second text analytics service, tag data derived from the corpus of plain text, the tag data including a listing of n-grams extracted from the corpus of plain text and n-gram frequency data; and
generating a hierarchical output schema that includes a schema resource node at an upper hierarchical level, the schema resource node including, at a lower hierarchical level within the schema resource node,a first sub-node that identifies the first text analytics service and a corresponding first graph index parameter,a second sub-node that identifies the second text analytics service and a corresponding second graph index parameter, anda third sub-node that identifies the text extraction service and a corresponding third graph index parameter;
wherein the hierarchical output schema further includes an analyzer/-enhancer node at the upper hierarchical level, the analyzer/enhancer node including, at a lower hierarchical level within the analyzer/enhancer node,a first sub-node that is identified by the first graph index parameter and that includes the plurality of topics and the corresponding relevancy scores,a second sub-node that is identified by the second graph index parameter and that includes the tag data, anda third sub-node that is identified by the third graph index parameter and that includes the corpus of plain text.
2 Assignments
0 Petitions
Accused Products
Abstract
A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.
45 Citations
12 Claims
-
1. A method of analyzing digital content to generate contextual analysis data based on the digital content, the method comprising:
-
receiving a request to analyze the digital content; invoking a text extraction service configured to extract a corpus of plain text from the digital content; receiving, from a first text analytics service, a plurality of topics extracted from a topic ontology, each of the topics having associated therewith a relevancy score, wherein at least one of the topics is not included in the corpus of plain text; receiving, from a second text analytics service, tag data derived from the corpus of plain text, the tag data including a listing of n-grams extracted from the corpus of plain text and n-gram frequency data; and generating a hierarchical output schema that includes a schema resource node at an upper hierarchical level, the schema resource node including, at a lower hierarchical level within the schema resource node, a first sub-node that identifies the first text analytics service and a corresponding first graph index parameter, a second sub-node that identifies the second text analytics service and a corresponding second graph index parameter, and a third sub-node that identifies the text extraction service and a corresponding third graph index parameter; wherein the hierarchical output schema further includes an analyzer/-enhancer node at the upper hierarchical level, the analyzer/enhancer node including, at a lower hierarchical level within the analyzer/enhancer node, a first sub-node that is identified by the first graph index parameter and that includes the plurality of topics and the corresponding relevancy scores, a second sub-node that is identified by the second graph index parameter and that includes the tag data, and a third sub-node that is identified by the third graph index parameter and that includes the corpus of plain text. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transient computer readable medium having instructions encoded thereon that, when executed by one or more processors, causes a digital content analysis process to be carried out, the process comprising:
-
receiving a request to analyze identified digital content; invoking a text extraction service configured to extract a corpus of plain text from the digital content; receiving, from a first text analytics service, a plurality of topics extracted from a topic ontology, each of the topics having associated therewith a relevancy score, wherein at least one of the topics is not included in the corpus of plain text; receiving, from a second text analytics service, tag data derived from the corpus of plain text, the tag data including a listing of n-grams extracted from the corpus of plain text and n-gram frequency data; and generating a hierarchical output schema that includes a schema resource node at an upper hierarchical level, the schema resource node including, at a lower hierarchical level within the schema resource node, a first sub-node that identifies the first text analytics service and a corresponding first graph index parameter, a second sub-node that identifies the second text analytics service and a corresponding second graph index parameter, and a third sub-node that identifies the text extraction service and a corresponding third graph index parameter; wherein the hierarchical output schema further includes an analyzer/-enhancer node at the upper hierarchical level, the analyzer/enhancer node including, at a lower hierarchical level within the analyzer/enhancer node, a first sub-node that is identified by the first graph index parameter and that includes a hierarchical listing of the plurality of topics and the corresponding relevancy scores, such that topics extracted from a relatively higher ontology level are listed higher in the hierarchical listing, and topics extracted from a relatively lower ontology level are listed lower in the hierarchical listing, wherein the hierarchical listing includes at least one of the topics not included in the corpus of plain text, a second sub-node that is identified by the second graph index parameter and that includes the tag data, and a third sub-node that is identified by the third graph index parameter and that includes the corpus of plain text. - View Dependent Claims (9, 10, 11, 12)
-
Specification