SYSTEMS AND METHODS FOR FACILITATING THE GATHERING OF OPEN SOURCE INTELLIGENCE
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.
23 Citations
25 Claims
-
1-6. -6. (canceled)
-
7. A system for use in extracting content of interest from a collection of webpages, the system comprising:
-
a processing module; and a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; acquire source code used to generate each webpage of a collection of webpages on a display; obtain a similarity matrix for the collection of webpages from the acquired source code using a text-based similarity measure; use the similarity matrix to determine a maximum dissimilarity score for each webpage in relation to the collection of webpages; and present, on a display, one or more lists of those webpages associated with maximum dissimilarity scores above a threshold dissimilarity score. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13-18. -18. (canceled)
-
19. A system for use in extracting a textual hierarchy from one or more webpages of a website for use in creating a hierarchical signature for the website, the system comprising:
-
a processing module; and a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to; receive the x most frequently disclosed terms among one or more pages of a website, wherein x is a positive integer; first cluster the x most frequently disclosed terms into two or more sets of terms as a function of semantic similarity between the terms in each subset; second cluster the two or more sets of terms into two or more subsets of terms utilizing contextual information located proximate each of the terms, wherein each of the terms in the subsets comprises a lower level term; and ascertain, for each of the two or more subsets, an upper level term that semantically encompasses each of the lower level terms in the subset. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25-38. -38. (canceled)
Specification