Systems and methods for facilitating the gathering of open source intelligence
First Claim
1. A system for use in extracting a textual hierarchy from one or more webpages of a website for use in creating a hierarchical signature for the website, the system comprising:
- a processor; and
a memory device logically connected to the processor and comprising a set of computer readable instructions executable by the processor to;
receive the x most frequently disclosed terms among one or more pages of a website, wherein x is a positive integer;
first cluster the x most frequently disclosed terms into two or more sets of terms as a function of semantic similarity between the terms in each set;
second cluster the two or more sets of terms into two or more subsets of terms by removing at least one term from a first of the two or more sets of terms based on contextual information located proximate the terms in the first set, wherein each of the terms in the subsets comprises a lower level term;
ascertain, for each of the two or more subsets, an upper level term that semantically encompasses each of the lower level terms in the subset, wherein the set of computer readable instructions executable by the processor to ascertain include one of;
utilizing a centroid of the subset as the upper level term for the subset, where the centroid is the keyword in the center of the subset;
orutilizing a deepest common root in a general purpose ontology including the lower level terms of the subset as the upper level term for the subset;
determine a prevalence of each of the lower level terms on the one or more pages of the website to obtain hierarchical signatures of the lower level terms;
use the hierarchical signatures of the lower level terms to establish hierarchical signatures for each of the upper level terms, wherein a hierarchical signature of the website comprises the hierarchical signatures of the upper level terms; and
present, on a display, a graphical representation of the hierarchical signature of the website.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyzes on such extracted content and present graphical representations of such analyzes on a display of a client device.
55 Citations
14 Claims
-
1. A system for use in extracting a textual hierarchy from one or more webpages of a website for use in creating a hierarchical signature for the website, the system comprising:
-
a processor; and a memory device logically connected to the processor and comprising a set of computer readable instructions executable by the processor to; receive the x most frequently disclosed terms among one or more pages of a website, wherein x is a positive integer; first cluster the x most frequently disclosed terms into two or more sets of terms as a function of semantic similarity between the terms in each set; second cluster the two or more sets of terms into two or more subsets of terms by removing at least one term from a first of the two or more sets of terms based on contextual information located proximate the terms in the first set, wherein each of the terms in the subsets comprises a lower level term; ascertain, for each of the two or more subsets, an upper level term that semantically encompasses each of the lower level terms in the subset, wherein the set of computer readable instructions executable by the processor to ascertain include one of; utilizing a centroid of the subset as the upper level term for the subset, where the centroid is the keyword in the center of the subset;
orutilizing a deepest common root in a general purpose ontology including the lower level terms of the subset as the upper level term for the subset; determine a prevalence of each of the lower level terms on the one or more pages of the website to obtain hierarchical signatures of the lower level terms; use the hierarchical signatures of the lower level terms to establish hierarchical signatures for each of the upper level terms, wherein a hierarchical signature of the website comprises the hierarchical signatures of the upper level terms; and present, on a display, a graphical representation of the hierarchical signature of the website. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method for use in automatically extracting a textual hierarchy from one or more webpages of a website for use in creating a hierarchical signature for the website, the method comprising:
-
receiving the x most frequently disclosed terms among one or more pages of a website, wherein x is a positive integer; first clustering, using a processor, the x most frequently disclosed terms into two or more sets of terms as a function of semantic similarity between the terms in each set; second clustering, using the processor, the two or more sets of terms into two or more subsets of terms by removing at least one term from a first of the two or more sets of terms based on contextual information located proximate the terms in the first set, wherein each of the terms in the subsets comprises a lower level term; ascertaining, for each of the two or more subsets, an upper level term that semantically encompasses each of the lower level terms in the subset, wherein the ascertaining includes; utilizing a centroid of the subset as the upper level term for the subset, wherein the centroid is the keyword in the center of the subset;
orutilizing a deepest common root in a general purpose ontology including the lower level terms of the subset as the upper level term for the subset; determining a prevalence of each of the lower level terms on the one or more pages of the website to obtain hierarchical signatures of the lower level terms; using the hierarchical signatures of the lower level terms to establish hierarchical signatures for each of the upper level terms, wherein a hierarchical signature of the website comprises the hierarchical signatures of the upper level terms; and presenting, on a display, a graphical representation of the hierarchical signature of the website. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method for use in automatically extracting a textual hierarchy from one or more webpages of a website for use in creating a hierarchical signature for the website, the method comprising:
-
receiving, at a processor, terms found in one or more pages of a web site integer; clustering, using a processor, the received terms into two or more sets of terms as a function of semantic similarity between the terms in each set; refining, using the processor, the two or more sets of terms into two or more subsets of terms utilizing contextual information located proximate each of the terms, wherein the refining includes removing at least one term from a first of the two or more sets of terms based on the contextual information located proximate the terms in the first set, wherein the terms remaining in the first set after the refining comprise one of the two or more subsets of terms, and wherein each of the terms in the subsets comprises a lower level term; ascertaining, for each of the two or more subsets, an upper level term that semantically encompasses each of the lower level terms in the subset, wherein the ascertaining includes one of; utilizing a centroid of the subset as the upper level term for the subset, where the centroid is the keyword in the center of the subset;
orutilizing a deepest common root in a general purpose ontology including the lower level terms of the subset as the upper level term for the subset; determining a prevalence of each of the lower level terms on the one or more pages of the website to obtain hierarchical signatures of the lower level terms; using the hierarchical signatures of the lower level terms to establish hierarchical signatures for each of the upper level terms; and presenting, on a display, a graphical representation of a hierarchical signature of the website, wherein the hierarchical signature of the website comprises the hierarchical signatures of the upper level terms.
-
Specification