SYSTEMS AND METHODS FOR FACILITATING THE GATHERING OF OPEN SOURCE INTELLIGENCE

US 20140181125A1
Filed: 01/16/2014
Published: 06/26/2014
Est. Priority Date: 08/15/2011
Status: Active Grant

First Claim

Patent Images

1-6. -6. (canceled)

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods (e.g., utilities) for use in providing automated, lightweight collection of online, open source data which may be content-based to reduce website source bias. In one aspect, a utility is disclosed for use in extracting content of interest from at least one website or other online data source (e.g., where the extracted content can be used in a subsequent search query). In other aspects, utilities are disclosed that are operable to perform various types of analyses on such extracted content and present graphical representations of such analyses on a display of a client device.

23 Citations

View as Search Results

25 Claims

1-6. -6. (canceled)

7. A system for use in extracting content of interest from a collection of webpages, the system comprising:
- a processing module; and
  
  a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to;
  
  acquire source code used to generate each webpage of a collection of webpages on a display;
  
  obtain a similarity matrix for the collection of webpages from the acquired source code using a text-based similarity measure;
  
  use the similarity matrix to determine a maximum dissimilarity score for each webpage in relation to the collection of webpages; and
  
  present, on a display, one or more lists of those webpages associated with maximum dissimilarity scores above a threshold dissimilarity score.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the webpages are arranged in the one or more lists according to date of publication of the webpage associated with the object.
  - 9. The system of claim 7, wherein the computer readable instructions are further executable by the processing module to:
    - first extract a first portion of content from the source code of a first webpage of the collection of webpages using a first set of inquiries;
      
      first determine, for the extracted first portion of the first webpage, a maximum similarity score in relation to the similarity matrix;
      
      generate an object with the extracted first portion responsive to the maximum similarity score being greater than a first threshold similarity score;
      
      otherwise,second extract from the source code of the first webpage a second portion of content using a second set of inquiries different than the first set of inquiries; and
      
      second determine, for the extracted second portion of the first webpage, a maximum similarity score in relation to the similarity matrix, wherein the computer readable instructions are operable to be repeatedly executed by the processing module to obtain a first group of objects corresponding to a plurality of extracted portions, and wherein the maximum dissimilarity scores are determined using the first group of objects.
  - 10. The system of claim 7, wherein the computer readable instructions are further executable by the processing module, after the processing module has executed the computer readable instructions to second determine the maximum similarity score in relation to the similarity matrix, to:
    - generate an object with the extracted second portion responsive to the maximum similarity score being greater than the first threshold similarity score;
      
      otherwise,delete the extracted second portion.
  - 11. The system of claim 7, wherein the computer readable instructions are further executable by the processing module to:
    - obtain, from the first group of objects, a second group of objects associated with extracted portions having maximum similarity scores above a second threshold similarity score that is greater than the first threshold similarity score, wherein the maximum dissimilarity scores are determined using the second group of objects.
  - 12. The system of claim 7, wherein the computer readable instructions executable by the processing module to first extract a first portion of content from the source code of a first webpage of the collection of webpages using the first set of inquiries comprise instructions executable by the processing module to:
    - determine a tag type of a tag of one or more elements of the source code, wherein the first portion of the first webpage is extracted responsive to the one or more determined tag types.

13-18. -18. (canceled)

19. A system for use in extracting a textual hierarchy from one or more webpages of a website for use in creating a hierarchical signature for the website, the system comprising:
- a processing module; and
  
  a memory module logically connected to the processing module and comprising a set of computer readable instructions executable by the processing module to;
  
  receive the x most frequently disclosed terms among one or more pages of a website, wherein x is a positive integer;
  
  first cluster the x most frequently disclosed terms into two or more sets of terms as a function of semantic similarity between the terms in each subset;
  
  second cluster the two or more sets of terms into two or more subsets of terms utilizing contextual information located proximate each of the terms, wherein each of the terms in the subsets comprises a lower level term; and
  
  ascertain, for each of the two or more subsets, an upper level term that semantically encompasses each of the lower level terms in the subset.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The system of claim 19, wherein the computer readable instructions are further executable by the processing module to:
    - determine a prevalence of each of the lower level terms on the one or more pages of the website to obtain hierarchical signatures of the lower level terms;
      
      use the hierarchical signatures of the lower level terms to establish hierarchical signatures for each of the upper level terms; and
      
      present, on a display, a graphical representation of a hierarchical signature of the website, wherein the hierarchical signature of the website comprises the hierarchical signatures of the upper level terms.
  - 21. The system of claim 19, wherein the computer readable instructions executable by the processing module to first cluster comprise instructions executable by the processing module to agglomeratively cluster the x most frequently disclosed terms into two or more sets of terms as a function of semantic similarity between the terms in each subset.
  - 22. The system of claim 19, wherein the contextual information for each term comprises a window of adjacent terms around the term.
  - 23. The system of claim 19, wherein the computer readable instructions executable by the processing module to second cluster utilize at least one of a bag-of-words model and Dice'"'"'s coefficient.
  - 24. The system of claim 19, wherein computer readable instructions executable by the processing module to ascertain comprise instructions executable by the processing module to find a deepest common root in a general purpose ontology including the lower level terms of the subset.

25-38. -38. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Leidos Innovations Technology, Inc., Lockheed Martin Corporation (Martin Marietta Corporation)
Original Assignee
Lockheed Martin Corporation (Martin Marietta Corporation)
Inventors
Moitra, Abha, Bracewell, David Brian, Gustafson, Steven Matt, Baylor, T. Michael, Chau, Tina H.

Granted Patent

US 10,235,421 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/749
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/35   Clustering; Classification

G06F 16/36   Creation of semantic tools,...

G06F 16/951   Indexing; Web crawling tech...

G06Q 30/0201   Market modelling; Market an...

G06Q 50/26   Government or public servic...

SYSTEMS AND METHODS FOR FACILITATING THE GATHERING OF OPEN SOURCE INTELLIGENCE

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

23 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR FACILITATING THE GATHERING OF OPEN SOURCE INTELLIGENCE

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

23 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links