INPUT/OUTPUT INTERFACE FOR CONTEXTUAL ANALYSIS ENGINE
First Claim
1. A method of analyzing digital content to generate contextual analysis data based on the digital content, the method comprising:
- receiving a request to analyze the digital content;
invoking a text extraction service configured to extract a corpus of plain text from the digital content;
receiving, from a text analytics service, topic data derived from the corpus of plain text, the topic data including a listing of topics and corresponding relevancy scores for the listed topics; and
generating a hierarchical output schema that includes a plurality of resource nodes, wherein a first resource node contains at least a portion of the corpus of plain text, and wherein a second resource node contains topics included within the listing received from the text analytics service.
2 Assignments
0 Petitions
Accused Products
Abstract
A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.
33 Citations
22 Claims
-
1. A method of analyzing digital content to generate contextual analysis data based on the digital content, the method comprising:
-
receiving a request to analyze the digital content; invoking a text extraction service configured to extract a corpus of plain text from the digital content; receiving, from a text analytics service, topic data derived from the corpus of plain text, the topic data including a listing of topics and corresponding relevancy scores for the listed topics; and generating a hierarchical output schema that includes a plurality of resource nodes, wherein a first resource node contains at least a portion of the corpus of plain text, and wherein a second resource node contains topics included within the listing received from the text analytics service. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of managing requests to obtain contextual analysis data, the method comprising:
-
receiving, from a content administration client, a request for contextual analysis data based on digital content that is identified in the request; determining whether the requested contextual analysis data is stored in a results cache; where it is determined that the requested contextual analysis data is stored in the results cache, providing the requested contextual analysis data to the content administration client; and where a determination is made that the requested contextual analysis data is not stored in the results cache, using a headless browser to extract a corpus of plain text from the digital content, generating new contextual analysis data that includes at least a portion of the corpus of plain text and a hierarchal listing of topics extracted from the corpus of plain text, persisting the new contextual analysis data in the results cache, and providing the new contextual analysis data to the content administration client. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A system for obtaining contextual analysis data corresponding to a webpage, the system comprising:
-
a results cache having persisted therein contextual analysis data; an input interface configured to receive, from a content administration tool, a request to obtain contextual analysis data corresponding to a webpage; a cache manager configured to determine whether the results cache has persisted therein the contextual analysis data corresponding to the webpage; an orchestration manager configured to invoke a text extraction service so as to extract a corpus of plain text from the webpage using a headless browser; and an output interface configured to provide, to the content administration tool, the contextual analysis data corresponding to the webpage, wherein the contextual analysis data provided to the content administration tool is either (a) obtained from the cache manager or (b) derived from the corpus of plain text that is extracted from the webpage. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transient computer readable medium having instructions encoded thereon that, when executed by one or more processors, causes a digital content analysis process to be carried out, the process comprising:
-
receiving a request to analyze the digital content; invoking a text extraction service configured to extract a corpus of plain text from the digital content; receiving, from a text analytics service, topic data derived from the corpus of plain text, the topic data including a listing of topics and corresponding relevancy scores for the listed topics; and generating a hierarchical output schema that includes a plurality of nodes, wherein a first node contains at least a portion of the corpus of plain text, and wherein a second node contains topics included within the listing received from the text analytics service. - View Dependent Claims (19, 20, 21, 22)
-
Specification