Input/output interface for contextual analysis engine

US 10,430,806 B2
Filed: 10/15/2013
Issued: 10/01/2019
Est. Priority Date: 10/15/2013
Status: Active Grant

First Claim

Patent Images

1. A method of analyzing digital content to generate contextual analysis data based on the digital content, the method comprising:

receiving a request to analyze the digital content;

invoking a text extraction service configured to extract a corpus of plain text from the digital content;

receiving, from a first text analytics service, a plurality of topics extracted from a topic ontology, each of the topics having associated therewith a relevancy score, wherein at least one of the topics is not included in the corpus of plain text;

receiving, from a second text analytics service, tag data derived from the corpus of plain text, the tag data including a listing of n-grams extracted from the corpus of plain text and n-gram frequency data; and

generating a hierarchical output schema that includes a schema resource node at an upper hierarchical level, the schema resource node including, at a lower hierarchical level within the schema resource node,a first sub-node that identifies the first text analytics service and a corresponding first graph index parameter,a second sub-node that identifies the second text analytics service and a corresponding second graph index parameter, anda third sub-node that identifies the text extraction service and a corresponding third graph index parameter;

wherein the hierarchical output schema further includes an analyzer/-enhancer node at the upper hierarchical level, the analyzer/enhancer node including, at a lower hierarchical level within the analyzer/enhancer node,a first sub-node that is identified by the first graph index parameter and that includes the plurality of topics and the corresponding relevancy scores,a second sub-node that is identified by the second graph index parameter and that includes the tag data, anda third sub-node that is identified by the third graph index parameter and that includes the corpus of plain text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services.

45 Citations

View as Search Results

12 Claims

1. A method of analyzing digital content to generate contextual analysis data based on the digital content, the method comprising:
- receiving a request to analyze the digital content;
  
  invoking a text extraction service configured to extract a corpus of plain text from the digital content;
  
  receiving, from a first text analytics service, a plurality of topics extracted from a topic ontology, each of the topics having associated therewith a relevancy score, wherein at least one of the topics is not included in the corpus of plain text;
  
  receiving, from a second text analytics service, tag data derived from the corpus of plain text, the tag data including a listing of n-grams extracted from the corpus of plain text and n-gram frequency data; and
  
  generating a hierarchical output schema that includes a schema resource node at an upper hierarchical level, the schema resource node including, at a lower hierarchical level within the schema resource node,a first sub-node that identifies the first text analytics service and a corresponding first graph index parameter,a second sub-node that identifies the second text analytics service and a corresponding second graph index parameter, anda third sub-node that identifies the text extraction service and a corresponding third graph index parameter;
  
  wherein the hierarchical output schema further includes an analyzer/-enhancer node at the upper hierarchical level, the analyzer/enhancer node including, at a lower hierarchical level within the analyzer/enhancer node,a first sub-node that is identified by the first graph index parameter and that includes the plurality of topics and the corresponding relevancy scores,a second sub-node that is identified by the second graph index parameter and that includes the tag data, anda third sub-node that is identified by the third graph index parameter and that includes the corpus of plain text.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the digital content comprises a webpage containing JavaScript elements.
  - 3. The method of claim 1, wherein the request to analyze the digital content complies with a representational state transfer (REST) architecture.
  - 4. The method of claim 1, wherein the request to analyze the digital content complies with an application programming interface mechanism.
  - 5. The method of claim 1, wherein:
    - the digital content comprises a webpage containing JavaScript elements; and
      
      the text extraction service uses a headless browser to extract the corpus of plain text.
  - 6. The method of claim 1, wherein:
    - the request to analyze the digital content includes a text extraction parameter; and
      
      invoking the text extraction service comprises passing the text extraction parameter to a text extraction module.
  - 7. The method of claim 1, wherein each of the topics extracted from the topic ontology further has associated therewith a frequency count.

8. A non-transient computer readable medium having instructions encoded thereon that, when executed by one or more processors, causes a digital content analysis process to be carried out, the process comprising:
- receiving a request to analyze identified digital content;
  
  invoking a text extraction service configured to extract a corpus of plain text from the digital content;
  
  receiving, from a first text analytics service, a plurality of topics extracted from a topic ontology, each of the topics having associated therewith a relevancy score, wherein at least one of the topics is not included in the corpus of plain text;
  
  receiving, from a second text analytics service, tag data derived from the corpus of plain text, the tag data including a listing of n-grams extracted from the corpus of plain text and n-gram frequency data; and
  
  generating a hierarchical output schema that includes a schema resource node at an upper hierarchical level, the schema resource node including, at a lower hierarchical level within the schema resource node,a first sub-node that identifies the first text analytics service and a corresponding first graph index parameter,a second sub-node that identifies the second text analytics service and a corresponding second graph index parameter, anda third sub-node that identifies the text extraction service and a corresponding third graph index parameter;
  
  wherein the hierarchical output schema further includes an analyzer/-enhancer node at the upper hierarchical level, the analyzer/enhancer node including, at a lower hierarchical level within the analyzer/enhancer node,a first sub-node that is identified by the first graph index parameter and that includes a hierarchical listing of the plurality of topics and the corresponding relevancy scores, such that topics extracted from a relatively higher ontology level are listed higher in the hierarchical listing, and topics extracted from a relatively lower ontology level are listed lower in the hierarchical listing, wherein the hierarchical listing includes at least one of the topics not included in the corpus of plain text,a second sub-node that is identified by the second graph index parameter and that includes the tag data, anda third sub-node that is identified by the third graph index parameter and that includes the corpus of plain text.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The non-transient computer readable medium of claim 8, wherein the hierarchical output schema further comprises an @context node at the upper hierarchical level that specifies the topic ontology from which the topics are extracted.
  - 10. The non-transient computer readable medium of claim 8, wherein the text extraction service uses a headless browser to extract the corpus of plain text without displaying the identified digital content.
  - 11. The non-transient computer readable medium of claim 8, wherein:
    - the text extraction service uses a headless browser to extract the corpus of plain text; and
      
      the headless browser is provided by a plugin that is scriptable with a JavaScript application programming interface (API).
  - 12. The non-transient computer readable medium of claim 8, wherein the digital content is identified in the request using a uniform resource locator (URL).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adobe Inc.
Original Assignee
Adobe Inc.
Inventors
Chang, Walter, Sadler, Shone, Jared, David, Chen, Chris
Primary Examiner(s)
Coupe, Anita
Assistant Examiner(s)
Prasad, Nancy

Application Number

US14/054,291
Publication Number

US 20150106156A1
Time in Patent Office

2,177 Days
Field of Search

None
US Class Current
CPC Class Codes

G06Q 30/0201 Market modelling; Market an...

Input/output interface for contextual analysis engine

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

12 Claims

Specification

Use Cases

Quick Links

Others

Input/output interface for contextual analysis engine

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

12 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others