Systems and methods for enterprise data search and analysis

US 10,372,718 B2
Filed: 11/03/2015
Issued: 08/06/2019
Est. Priority Date: 11/03/2014
Status: Active Grant

First Claim

Patent Images

1. A method of analyzing a search of a plurality of text documents by a search system comprising a plurality of computing nodes comprising at least a processor coupled to a non-transitory memory, at least one network-attached storage device coupled to the plurality of computing nodes, a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software, and a network management module coupled to the system management module and configured to communicate with a network, the search resulting in a set of expanded search terms, a search document set, and a plurality of passages of interest, wherein the plurality of passages of interest are portions of the plurality of text documents generated by the search and are divided into a plurality of groups, comprising the steps of:

obtaining all passages of interest generated by the search;

determining all unique roots of interest included in each group, wherein each root of interest corresponds to terms wherein the term is the same as the root of interest and terms wherein the root of interest is the root of the term;

listing of all unique roots of interest for each group and a number of times terms corresponding to the root of interest occur in the group for each unique root of interest;

ranking of roots of interest in each group in order of occurrence in the group;

determining all unique repeating term sequences in each group, wherein each repeating term sequence comprises two or more contiguous terms;

listing of all unique repeating term sequences for each group and a number of times each unique repeating term sequence occurs in the group;

ranking of all repeating term sequences in each group in order of occurrence in the group;

determining all concepts of interest in each group, wherein each concept of interest corresponds to a first root term associated with a second different root term, wherein each concept of interest is an occurrence, in one passage of interest, of one term of a first term group occurring in the passage of interest prior to the occurrence of one term of a second term group, the first term group consisting of the first root term and stems of the first root term and the second term group consisting of the second root term and stems of the second root term, wherein the one of the first term group is separated from the one of the second term group by at least one other term and by fewer than a predetermined context window of terms;

identifying of all unique concepts of interest for each group;

listing of all unique concepts of interest for each group and a number of times each unique concept of interest occurs in the group;

ranking of all concepts of interest in each group in order of occurrence in the group;

determining all unique general identifiers in each group, wherein each general identifier comprises a non-word term in the group;

listing of all unique general identifiers and a number of times each unique general identifier occurs in the group; and

ranking of all general identifiers in each group in order of occurrence in the group.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for enterprise searching of documents. The system comprises a computing system configured to receive one or more search terms, and responsively analyze a group of documents to return analysis results. A method for enterprise searching includes indexing the group of documents, determining relevant terms and measuring the context between terms. Relevant portions of documents, also called passages of interest, are determined as part of the analysis process. The analysis includes analyzing the passages of interest for words, repeating term sequences, non-consecutive repeating root term sequences, and non-word terms. The terms/sequences are scored and sorted, resulting in a set of high-importance items, allowing a user to quickly subselect search results without reading through the results.

76 Citations

View as Search Results

15 Claims

1. A method of analyzing a search of a plurality of text documents by a search system comprising a plurality of computing nodes comprising at least a processor coupled to a non-transitory memory, at least one network-attached storage device coupled to the plurality of computing nodes, a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software, and a network management module coupled to the system management module and configured to communicate with a network, the search resulting in a set of expanded search terms, a search document set, and a plurality of passages of interest, wherein the plurality of passages of interest are portions of the plurality of text documents generated by the search and are divided into a plurality of groups, comprising the steps of:
- obtaining all passages of interest generated by the search;
  
  determining all unique roots of interest included in each group, wherein each root of interest corresponds to terms wherein the term is the same as the root of interest and terms wherein the root of interest is the root of the term;
  
  listing of all unique roots of interest for each group and a number of times terms corresponding to the root of interest occur in the group for each unique root of interest;
  
  ranking of roots of interest in each group in order of occurrence in the group;
  
  determining all unique repeating term sequences in each group, wherein each repeating term sequence comprises two or more contiguous terms;
  
  listing of all unique repeating term sequences for each group and a number of times each unique repeating term sequence occurs in the group;
  
  ranking of all repeating term sequences in each group in order of occurrence in the group;
  
  determining all concepts of interest in each group, wherein each concept of interest corresponds to a first root term associated with a second different root term, wherein each concept of interest is an occurrence, in one passage of interest, of one term of a first term group occurring in the passage of interest prior to the occurrence of one term of a second term group, the first term group consisting of the first root term and stems of the first root term and the second term group consisting of the second root term and stems of the second root term, wherein the one of the first term group is separated from the one of the second term group by at least one other term and by fewer than a predetermined context window of terms;
  
  identifying of all unique concepts of interest for each group;
  
  listing of all unique concepts of interest for each group and a number of times each unique concept of interest occurs in the group;
  
  ranking of all concepts of interest in each group in order of occurrence in the group;
  
  determining all unique general identifiers in each group, wherein each general identifier comprises a non-word term in the group;
  
  listing of all unique general identifiers and a number of times each unique general identifier occurs in the group; and
  
  ranking of all general identifiers in each group in order of occurrence in the group.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of analyzing the search of the plurality of text documents by the search system of claim 1, further comprising the steps of:
    - scoring each of the roots of interest;
      
      scoring each of the repeating term sequences;
      
      scoring each of the concepts of interest; and
      
      scoring each of the general identifiers.
  - 3. The method of analyzing the search of the plurality of text documents by the search system of claim 2, wherein the score for each root of interest is a frequency of occurrence of the root of interest in the group divided by an average frequency of occurrence of the root of interest in the plurality of text documents.
  - 4. The method of analyzing the search of the plurality of text documents by the search system of claim 2, wherein the score for each repeating term sequence is a frequency of occurrence of the repeating term sequence in the group divided by an average frequency of occurrence of the repeating term sequence in the plurality of text documents.
  - 5. The method of analyzing the search of the plurality of text documents by the search system of claim 4, wherein the score for each repeating term sequence is reduced for each high-frequency term located in at least one of the group consisting of a first term in the repeating term sequence and a last term in the repeating term sequence.
  - 6. The method of analyzing the search of the plurality of text documents by the search system of claim 5, wherein the score for each repeating term sequence is multiplied by a number of occurrences of the repeating term sequence in the group.
  - 7. The method of analyzing the search of the plurality of text documents by the search system of claim 2, wherein the score for each concept of interest is a frequency of occurrence of the concept of interest in the group divided by an average frequency of occurrence of the concept of interest in the plurality of text documents.
  - 8. The method of analyzing the search of the plurality of text documents by the search system of claim 7, wherein the score for each concept of interest is reduced for each high-frequency term located in at least one of the group consisting of a first term in the concept of interest and a last term in the concept of interest.
  - 9. The method of analyzing the search of the plurality of text documents by the search system of claim 8, wherein the score for each concept of interest is multiplied by a number of occurrences of the concept of interest in the group.
  - 10. The method of analyzing the search of the plurality of text documents by the search system of claim 2, wherein the score for each general identifier is a frequency of occurrence of the general identifier in the group divided by an average frequency of occurrence of the general identifier in the plurality of text documents.
  - 11. The method of analyzing the search of the plurality of text documents by the search system of claim 2, further comprising the step of:
    - creating a partial file for each group, each partial file comprising a plurality of highest-scored roots of interest of the group, a plurality of highest-scored repeating term sequences of the group, a plurality of highest-scored concepts of interest of the group, and a plurality of highest-scored general identifiers of the group.
  - 12. The method of analyzing the search of the plurality of text documents by the search system of claim 11, further comprising the step of combining the partial files into a grand score file.
  - 13. The method of analyzing the search of the plurality of text documents by the search system of claim 12, further comprising the step of sorting each of the plurality of roots of interest, the plurality of repeating term sequences, the plurality of concepts of interest, and the plurality of general identifiers of the grand score file by score.
  - 14. The method of analyzing the search of the plurality of text documents by the search system of claim 13, wherein the grand score file is displayed for a user of a computing device in communication with the search system.
  - 15. The method of analyzing the search of the plurality of text documents by the search system of claim 14, wherein the grand score file is displayed in a tree format.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SavantX
Original Assignee
SavantX
Inventors
Ostby, David Linus, Heinbockel, Edmond Audrey
Primary Examiner(s)
Reyes, Mariela
Assistant Examiner(s)
Harmon, Courtney

Application Number

US14/931,709
Publication Number

US 20160125038A1
Time in Patent Office

1,372 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/24575   using context

G06F 16/2458   Special types of queries, e...

G06F 16/248   Presentation of query results

G06F 16/285   Clustering or classification

Systems and methods for enterprise data search and analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

76 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for enterprise data search and analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

76 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links