×

Systems and methods for facilitating open source intelligence gathering

  • US 8,620,849 B2
  • Filed: 03/10/2011
  • Issued: 12/31/2013
  • Est. Priority Date: 03/10/2010
  • Status: Active Grant
First Claim
Patent Images

1. A website content extraction system, comprisinga processor;

  • anda memory logically connected to the processor and comprising a set of computer readable instructions executable by the processor to;

    obtain source code used to generate the website on a display, wherein the source code includes a plurality of elements and each element includes at least one tag comprising at least one tag type;

    parse the source code to obtain a node tree including a plurality of nodes arranged in a hierarchical structure, wherein each node comprises one of the elements, and wherein one of the plurality of nodes comprises a root node;

    determine a tag type of a node under the root node;

    assign a heuristic score to the node based at least in part on the tag type of the node;

    continue to determine and assign for one or more additional nodes of the node tree, wherein the node under the root node comprises a parent node, and wherein the computer readable instructions that continue to determine and assign include instructions executable by the processor to;

    determine, for a child node of the parent node, a tag type of the at least one tag of the child node; and

    assign a heuristic score to the child node based at least in part of the tag type of the child node, wherein the computer readable instructions that assign the heuristic score to the child node include instructions executable by the processor to;

    assign a first heuristic score to the child node without regard to the heuristic scores of other nodes in the node tree; and

    add the first heuristic score to a heuristic score of the parent node to obtain a child node heuristic score; and

    generate an object that includes content associated with nodes of the node tree having heuristic scores indicating that such content is of interest.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×