DISTRIBUTED REVERSE SEMANTIC INDEX
First Claim
1. A method comprising:
- receiving a plurality of documents, each document having at least one defined rule/semantic;
distributing the plurality of documents among a plurality of nodes of a system;
processing the documents in a generally parallel fashion, processing the documents comprising;
processing text data of each document, andbreaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic;
combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and
semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
0 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
14 Citations
9 Claims
-
1. A method comprising:
-
receiving a plurality of documents, each document having at least one defined rule/semantic; distributing the plurality of documents among a plurality of nodes of a system; processing the documents in a generally parallel fashion, processing the documents comprising; processing text data of each document, and breaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic; combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification