Vector throttling to control resource use in computer systems
First Claim
1. A method comprising:
- receiving a plurality of documents, each document having at least one defined rule/semantic;
distributing the plurality of documents among a plurality of nodes of a system;
processing the documents in a generally parallel fashion, processing the documents comprising;
processing text data of each document, andbreaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic;
combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and
semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
48 Citations
20 Claims
-
1. A method comprising:
-
receiving a plurality of documents, each document having at least one defined rule/semantic; distributing the plurality of documents among a plurality of nodes of a system; processing the documents in a generally parallel fashion, processing the documents comprising; processing text data of each document, and breaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic; combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system, comprising:
means for building a distributed reverse semantic index including semantic index shards, each semantic index shard including documents of a similar document type, comprising; means for receiving a plurality of documents, each document having at least one defined rule/semantic; means for distributing the plurality of documents among a plurality of nodes of the system; means for processing the documents in a generally parallel fashion, processing the documents including processing text data of each document of the plurality of documents and breaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rule/semantic; means for recombining the indexed data to create an indexer-agnostic semantic index including a plurality of the semantic index shards; and means for semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index including the indexer-agnostic index and the groups organized as the index shards. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
20. A computer program product, comprising:
- a computer readable medium having computer usable program code embodied therewith, the computer usable program code comprising;
computer usable program code configured to receive a plurality of documents, each document having at least one defined rule/semantic; computer usable program code configured to distribute the plurality of documents among a plurality of nodes of the system; computer usable program code configured to process the plurality of documents by the plurality of nodes in a generally parallel fashion, including process text data of each document of the plurality of documents and break each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the defined rules/semantics; computer usable program code configured to recombine the indexed data to create an indexer-agnostic semantic index including a plurality of the semantic index shards; and computer usable program code configured to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
- a computer readable medium having computer usable program code embodied therewith, the computer usable program code comprising;
Specification