DISTRIBUTED REVERSE SEMANTIC INDEX

US 20120323919A1
Filed: 08/27/2012
Published: 12/20/2012
Est. Priority Date: 03/31/2011
Status: Abandoned Application

First Claim

Patent Images

1. A method comprising:

receiving a plurality of documents, each document having at least one defined rule/semantic;

distributing the plurality of documents among a plurality of nodes of a system;

processing the documents in a generally parallel fashion, processing the documents comprising;

processing text data of each document, andbreaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic;

combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and

semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the invention relate to building a distributed reverse semantic index. In one general embodiment a plurality of documents are received with each document having at least one defined rule and or semantic. The documents are distributed among a plurality of nodes of a system. The documents are processed in a generally parallel fashion. Processing the documents includes processing text data of each of the document and breaking each document into fields to index the text data to create index data by deferring how to categorize the text data based upon the defined rule and or semantics. The indexed data is combined back together to create an indexer-agnostic semantic index including a plurality of the semantic index shards and to semantically classify the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.

14 Citations

9 Claims

1. A method comprising:
- receiving a plurality of documents, each document having at least one defined rule/semantic;
  
  distributing the plurality of documents among a plurality of nodes of a system;
  
  processing the documents in a generally parallel fashion, processing the documents comprising;
  
  processing text data of each document, andbreaking each document into fields to index the text data to create index data by deferring on how to categorize the text data based upon the at least one defined rules/semantic;
  
  combining the indexed data back together to create an indexer-agnostic semantic index including a plurality of semantic index shards; and
  
  semantically classifying the documents based on the index shards into groups based on document type to create the distributed reverse semantic index.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - distributing the plurality of documents among the plurality of nodes for causing each node of the plurality of nodes to have a generally balanced load.
  - 3. The method of claim 1, wherein each document of the plurality of documents has at least one defined rule/semantic that may be at least one of a topic, a country of origin and a metadata interrelationship.
  - 4. The method of claim 1 further comprising:
    - receiving the plurality of the documents further comprises receiving the plurality of documents by a distributed file system.
  - 5. The method of claim 4 further comprising:
    - receiving the plurality of documents by a fault tolerant version of the distributed file system.
  - 6. The method of claim 1 further comprising:
    - processing the documents in a generally parallel fashion further comprises;
      
      generally parallel processing the documents by at least two nodes of the plurality of nodes.
  - 7. The method of claim 6 further comprising:
    - generally parallel processing the documents by at least one processor included in each of the at least two nodes.
  - 8. The method of claim 1 further comprising:
    - using an index builder for combining the index data.
  - 9. The method of claim 8 further comprising:
    - the index builder including at least one processor for combining by the index data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Alba, Alfredo, DeLuca, Chad E., Ercegovac, Vuk, Griffin, Thomas D., Rao, Jun, Shekita, Eugene J., Singh, Asim V., Tian, Yuanyuan, Wang, Kevin B.

Application Number

US13/595,761
Publication Number

US 20120323919A1
Time in Patent Office

Days
Field of Search
US Class Current

707/738
CPC Class Codes

G06F 16/35 Clustering; Classification

DISTRIBUTED REVERSE SEMANTIC INDEX

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

14 Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

DISTRIBUTED REVERSE SEMANTIC INDEX

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

14 Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links