SYSTEMS AND METHODS FOR FACILITATING OPEN SOURCE INTELLIGENCE GATHERING
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.
85 Citations
88 Claims
-
1-13. -13. (canceled)
-
14. A website content extraction system, comprising
a processing module; - and
a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; obtain source code used to generate the website on a display, wherein the source code includes a plurality of elements and each element includes at least one tag comprising at least one tag type; parse the source code to obtain a node tree including a plurality of nodes arranged in a hierarchical structure, wherein each node comprises one of the elements, and wherein one of the plurality of nodes comprises a root node; determine a tag type of a node under the root node; assign a heuristic score to the node based at least in part on the tag type of the node; continue to determine and assign for one or more additional nodes of the node tree; and generate an object that includes content associated with nodes of the node tree having heuristic scores indicating that such content is of interest. - View Dependent Claims (15, 18, 20, 22)
- and
-
16-17. -17. (canceled)
-
19. (canceled)
-
21. (canceled)
-
23-36. -36. (canceled)
-
37. A system for use in determining a sentiment of a term among a plurality of data sets, the system comprising:
-
a processing module; and a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; receive the x most frequently disclosed terms among a plurality of data sets during a time period, wherein x is a positive integer; for each of the x most frequently disclosed terms during the time period; determine a volume of the plurality of data sites disclosing the term; and obtain a sentiment of the term among the plurality of data sites; and present, on a display, a first graphical representation illustrating the sentiment and volume of each of the x most frequently disclosed terms during the time period. - View Dependent Claims (38, 39, 41, 43)
-
-
40. (canceled)
-
42. (canceled)
-
44-60. -60. (canceled)
-
61. A system for use in creating a hierarchical signature for a website, the system comprising:
-
a processing module; and a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; identify at least one textual hierarchy including at least first and second levels, wherein the first level comprises at least one textual category and the second level comprises at least one term that describes the at least one textual category; determine a number of occurrences of the at least one term from a number of pages of at least one website during a time period; first obtain, using a processing engine, a hierarchical signature of the at least one term that represents a prevalence of the at least one term on the at least one website; second obtain, from the first obtaining step, a hierarchical signature of the at least one textual category that represents a prevalence of the at least one textual category on the at least one website; establish a hierarchical signature of the at least one website utilizing the hierarchical signature of one or more of the at least one term and the at least one textual category; and present, on a display, a graphical representation of the hierarchical signature of the at least one website, wherein the graphical representation illustrates the prevalence of one or more of the at least one term and the at least one textual category. - View Dependent Claims (62, 67, 70, 71)
-
-
63-66. -66. (canceled)
-
68-69. -69. (canceled)
-
72-82. -82. (canceled)
-
83. A system for use in inferring an online information flow network, the system comprising:
-
a processing module; and a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; receive information related to a plurality of portions of source code used to generate a plurality of online data sources, wherein the information allows a uniform resource locator (URL) to be obtained for at least one of the data sources; determine, from the information using a processor, whether any of the plurality of online data sources refers to another online data source during a first of a plurality of time periods, wherein any online data source that refers to another online data source comprises a “
secondary data source”
, and wherein any online data source that is referred to by another online data source comprises a “
primary data source”
;in response to at least some of the plurality of online data sources referring to other online data sources, obtain, from the information, a unique URL for each of the primary and secondary data sources; continue to determine and obtain for additional time periods; and present, on a display, a graphical representation of an online information flow network that illustrates one or more information flow links connecting and representing information flows from primary data sources to secondary data sources over the plurality of time periods. - View Dependent Claims (84, 85, 86, 87)
-
-
88-91. -91. (canceled)
Specification