Systems and methods for enterprise data search and analysis
First Claim
1. A method of generating search results substantially in real time by a search system comprising a plurality of computing nodes each comprising a node system management module, a non-transitory memory, and at least one process kernel coupled to the memory and system management module, the search system further comprising at least one network-attached storage device coupled to the plurality of computing nodes, a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software, and a network management module coupled to the system management module and configured to communicate with a network, comprising the steps of:
- indexing a plurality of text files stored on the at least one network-attached storage device, each text file comprised of a plurality of terms, wherein the indexing includes relating each term to text files including that term;
receiving over the network of at least two search terms;
identifying at least one stem of at least one search term, wherein the search terms and the at least one search term stem comprise a set of expanded search terms;
identifying of a plurality of search term documents wherein the search term documents comprise text files including at least one occurrence of at least one search term;
identifying of search term stem documents wherein the search term stem documents comprise text files including at least one occurrence of at least one search term stem, wherein the search term documents and the search term stem documents comprise a search document set;
identifying each location of each expanded search term in each text file;
identifying each location in each text file where two expanded search terms are in context, wherein two expanded search terms are in context when the two expanded search terms are separated by fewer terms than a predetermined context window number of terms;
identifying at least one extract of interest, wherein for each location where two expanded search terms are in context the extract of interest comprises at least the two expanded search terms and the separating terms, and wherein for each location where the expanded search term is not in context with another expanded search term the extract of interest comprises a portion of the text file centering on the expanded search term location, wherein the length of the extract of interest is the context window number of terms; and
determining passages of interest, wherein each passage of interest is a portion of the text file that includes at least one extract of interest.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for enterprise searching of documents. The system comprises a computing system configured to receive one or more search terms, and responsively analyze a group of documents to return analysis results. A method for enterprise searching includes indexing the group of documents, determining relevant terms and measuring the context between terms. Relevant portions of documents, also called passages of interest, are determined as part of the analysis process. The analysis also uses a calculated importance value of terms as part of the analysis process.
76 Citations
19 Claims
-
1. A method of generating search results substantially in real time by a search system comprising a plurality of computing nodes each comprising a node system management module, a non-transitory memory, and at least one process kernel coupled to the memory and system management module, the search system further comprising at least one network-attached storage device coupled to the plurality of computing nodes, a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software, and a network management module coupled to the system management module and configured to communicate with a network, comprising the steps of:
-
indexing a plurality of text files stored on the at least one network-attached storage device, each text file comprised of a plurality of terms, wherein the indexing includes relating each term to text files including that term; receiving over the network of at least two search terms; identifying at least one stem of at least one search term, wherein the search terms and the at least one search term stem comprise a set of expanded search terms; identifying of a plurality of search term documents wherein the search term documents comprise text files including at least one occurrence of at least one search term; identifying of search term stem documents wherein the search term stem documents comprise text files including at least one occurrence of at least one search term stem, wherein the search term documents and the search term stem documents comprise a search document set; identifying each location of each expanded search term in each text file; identifying each location in each text file where two expanded search terms are in context, wherein two expanded search terms are in context when the two expanded search terms are separated by fewer terms than a predetermined context window number of terms; identifying at least one extract of interest, wherein for each location where two expanded search terms are in context the extract of interest comprises at least the two expanded search terms and the separating terms, and wherein for each location where the expanded search term is not in context with another expanded search term the extract of interest comprises a portion of the text file centering on the expanded search term location, wherein the length of the extract of interest is the context window number of terms; and determining passages of interest, wherein each passage of interest is a portion of the text file that includes at least one extract of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for generating search results substantially in real time, comprising:
-
a plurality of computing nodes comprising at least a processor coupled to a non-transitory memory; at least one network-attached storage device coupled to the plurality of computing nodes; a system management module comprising at least a processor coupled to a non-transitory memory, the system management module coupled to the plurality of computing nodes and configured to run at least one system management software; a network management module coupled to the system management module and configured to communicate with a network, wherein the system is configured to perform the steps of; indexing a plurality of text files, each text file stored on the at least one network-attached storage device, each text file comprised of a plurality of terms, wherein the indexing includes relating each term to text files including that term; receiving over the network at least two search terms; identifying at least one stem of at least one search term, wherein the search terms and the at least one search term stem comprise the expanded search terms; identifying of a plurality of search term documents, wherein the search term documents comprise text files, including at least one occurrence of at least one search term; identifying of search term stem documents, wherein the search term stem documents comprise text files, including at least one occurrence of at least one search term stem, wherein the search term stem documents and the search term stem documents comprise a search document set; identifying each location of each expanded search term in each text file; identifying each location in each text file where two expanded search terms are in context, wherein two expanded search terms are in context when the two expanded search terms are separated by fewer terms than a predetermined context window number of terms; identifying at least one extract of interest, wherein for each location where two expanded search terms are in context the extract of interest comprises at least the two expanded search terms and the separating terms, and wherein for each location where the expanded search term is not in context with another expanded search term the extract of interest comprises a portion of the text file centering on the expanded search term location, wherein the length of the extract of interest is the context window number of terms; and determining passages of interest, wherein each passage of interest is a portion of the text file that includes at least one extract of interest. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification