SYSTEMS AND METHODS FOR ENTERPRISE DATA SEARCH AND ANALYSIS
First Claim
1. A method of analyzing a search of a plurality of text documents by a search system comprising a plurality of computing nodes comprising at least a processor coupled to a non-transitory memory, at least one network-attached storage device coupled to the plurality of computing nodes, a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software, and a network management module coupled to the system management module and configured to communicate with a network, the search resulting in a set of expanded search terms, a search document set, and a plurality of POI, wherein the plurality of POI are divided into a plurality of groups, comprising the steps of:
- obtaining all POI generated by the search;
ranking of all of a plurality of ROI in each group in order of occurrence in the group, wherein each ROI is one of a term in the group and a root of the term in the group;
ranking of all of a plurality of RTS in each group in order of occurrence in the group, wherein each RTS includes more than one contiguous term;
ranking of all of a plurality of COI in order of occurrence in the group, wherein each COI comprises two non-contiguous term roots found within a context window, wherein the context window is a consecutive number of terms in a document; and
ranking of all of a plurality of GID in order of occurrence in the group, wherein the GID are non-word terms in the group.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for enterprise searching of documents. The system comprises a computing system configured to receive one or more search terms, and responsively analyze a group of documents to return analysis results. A method for enterprise searching includes indexing the group of documents, determining relevant terms and measuring the context between terms. Relevant portions of documents, also called passages of interest, are determined as part of the analysis process. The analysis includes analyzing the passages of interest for words, repeating term sequences, non-consecutive repeating root term sequences, and non-word terms. The terms/sequences are scored and sorted, resulting in a set of high-importance items, allowing a user to quickly subselect search results without reading through the results.
22 Citations
19 Claims
-
1. A method of analyzing a search of a plurality of text documents by a search system comprising a plurality of computing nodes comprising at least a processor coupled to a non-transitory memory, at least one network-attached storage device coupled to the plurality of computing nodes, a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software, and a network management module coupled to the system management module and configured to communicate with a network, the search resulting in a set of expanded search terms, a search document set, and a plurality of POI, wherein the plurality of POI are divided into a plurality of groups, comprising the steps of:
-
obtaining all POI generated by the search; ranking of all of a plurality of ROI in each group in order of occurrence in the group, wherein each ROI is one of a term in the group and a root of the term in the group; ranking of all of a plurality of RTS in each group in order of occurrence in the group, wherein each RTS includes more than one contiguous term; ranking of all of a plurality of COI in order of occurrence in the group, wherein each COI comprises two non-contiguous term roots found within a context window, wherein the context window is a consecutive number of terms in a document; and ranking of all of a plurality of GID in order of occurrence in the group, wherein the GID are non-word terms in the group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification