Vector throttling to control resource use in computer systems

US 20120254089A1
Filed: 03/31/2011
Published: 10/04/2012
Est. Priority Date: 03/31/2011
Status: Abandoned Application

First Claim

Patent Images

1. A method comprising:

receiving a plurality of documents, each document having at least one defined rule/semantic;

distributing the plurality of documents among a plurality of nodes of a system;

processing the documents in a generally parallel fashion, processing the documents comprising;

processing text data of each document, andbreaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic;

combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and

semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.

48 Citations

20 Claims

1. A method comprising:
- receiving a plurality of documents, each document having at least one defined rule/semantic;
  
  distributing the plurality of documents among a plurality of nodes of a system;
  
  processing the documents in a generally parallel fashion, processing the documents comprising;
  
  processing text data of each document, andbreaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic;
  
  combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and
  
  semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - distributing the plurality of documents among the plurality of nodes for causing each node of the plurality of nodes to have a generally balanced load.
  - 3. The method of claim 1, wherein each document of the plurality of documents has at least one defined rule/semantic that may be at least one of a topic, a country of origin and a metadata interrelationship.
  - 4. The method of claim 1 further comprising:
    - receiving the plurality of the documents further comprises receiving the plurality of documents by a distributed file system.
  - 5. The method of claim 4 further comprising:
    - receiving the plurality of documents by a fault tolerant version of the distributed file system.
  - 6. The method of claim 1 further comprising:
    - processing the documents in a generally parallel fashion further comprises;
      
      generally parallel processing the documents by at least two nodes of the plurality of nodes.
  - 7. The method of claim 6 further comprising:
    - generally parallel processing the documents by at least one processor included in each of the at least two nodes.
  - 8. The method of claim 1 further comprising:
    - using an index builder for combining the index data.
  - 9. The method of claim 8 further comprising:
    - the index builder including at least one processor for combining by the index data.

10. A system, comprising:
- means for building a distributed reverse semantic index including semantic index shards, each semantic index shard including documents of a similar document type, comprising;
  
  means for receiving a plurality of documents, each document having at least one defined rule/semantic;
  
  means for distributing the plurality of documents among a plurality of nodes of the system;
  
  means for processing the documents in a generally parallel fashion, processing the documents including processing text data of each document of the plurality of documents and breaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rule/semantic;
  
  means for recombining the indexed data to create an indexer-agnostic semantic index including a plurality of the semantic index shards; and
  
  means for semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index including the indexer-agnostic index and the groups organized as the index shards.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The system of claim 10, wherein the means for distributing the plurality of documents further comprises:
    - means for distributing the plurality of documents to more than one node of the plurality of nodes for causing the more than one nodes to have a generally balanced load.
  - 12. The system of claim 10, wherein each document has at least one defined rule/semantic that may be at least one of a topic, a country of origin, and a metadata interrelationship.
  - 13. The system of claim 10, further comprising:
    - a processor operative to execute computer usable program code;
      
      at least one of a network interface and a peripheral device interface for receiving user input; and
      
      a computer usable medium having computer usable program code embodied therewith, the computer usable program code comprising;
      
      computer usable program code configured to receive a plurality of documents, each document having at least one defined rule/semantic;
      
      computer usable program code configured to distribute the plurality of documents among a plurality of nodes of the system;
      
      computer usable program code configured to process the plurality of documents by the plurality of nodes in a generally parallel fashion, processing the plurality of documents including processing text data of each document and breaking each of the document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the defined rules/semantics;
      
      computer usable program code configured to recombine the indexed data to create an indexer-agnostic semantic index including a plurality of the semantic index shards; and
      
      computer usable program code configured to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
  - 14. The system of claim 10, wherein the means for receiving the plurality of the documents further comprises the means for receiving by a distributed file system the plurality of the documents.
  - 15. The system of claim 14, wherein the means for receiving by the distributed file system the plurality of the documents further comprises receiving by a fault tolerant version of the distributed file system the plurality of the documents.
  - 16. The system of claim 10, wherein the means for processing the documents in a generally parallel fashion further comprises generally parallel processing the documents by at least two nodes of the plurality of nodes.
  - 17. The system of claim 16, wherein the means for generally parallel processing the documents by the at least two nodes of the plurality of nodes further comprises:
    - means for generally parallel processing the documents by at least one processor included in each of the at least two nodes of the plurality of nodes.
  - 18. The system of claim 10, wherein the step combining the index data further comprises:
    - means for using an index builder to combine the index data.
  - 19. The system of claim 18, wherein the index builder includes at least one processor for combining by the index data.

20. A computer program product, comprising:
- a computer readable medium having computer usable program code embodied therewith, the computer usable program code comprising;
  
  computer usable program code configured to receive a plurality of documents, each document having at least one defined rule/semantic;
  
  computer usable program code configured to distribute the plurality of documents among a plurality of nodes of the system;
  
  computer usable program code configured to process the plurality of documents by the plurality of nodes in a generally parallel fashion, including process text data of each document of the plurality of documents and break each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the defined rules/semantics;
  
  computer usable program code configured to recombine the indexed data to create an indexer-agnostic semantic index including a plurality of the semantic index shards; and
  
  computer usable program code configured to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Alba, Alfredo, DeLuca, Chad E., Ercegovac, Vuk, Griffin, Thomas D., Rao, Jun, Shekita, Eugene J., Singh, Asim V., Tian, Yuanyuan, Wang, Kevin B.

Application Number

US13/077,586
Publication Number

US 20120254089A1
Time in Patent Office

Days
Field of Search
US Class Current

706/47
CPC Class Codes

G06F 16/35 Clustering; Classification

Vector throttling to control resource use in computer systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

48 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Vector throttling to control resource use in computer systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

48 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others