SYSTEM AND METHOD FOR MULTITHREADED TEXT INDEXING FOR NEXT GENERATION MULTI-CORE ARCHITECTURES
First Claim
1. A method for indexing documents, comprising:
- generating a single document hash table in storage memory for a single document using an index construction in a multithreaded and scalable configuration wherein multiple threads are each assigned work to reduce synchronization between threads wherein generating a single document hash table includes;
partitioning the single document a plurality of subparts and indexing strings of partitioned subparts of the single document to create a minor hash table for each subpart;
generating a document level hash table from the minor hash tables;
updating a stream level hash table for the strings which maps every string to a global identifier; and
generating a term reordered array from the document level hash table.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for indexing documents in a data storage system includes generating a single document hash table in storage memory for a single document using an index construction in a multithreaded and scalable configuration wherein multiple threads are each assigned work to reduce synchronization between threads. The single document hash table includes partitioning the single document and indexing strings of partitioned portions of the single document to create a minor hash table for each document sub-part; generating a document level hash table from the minor hash tables; updating a stream level hash table for the strings which maps every string to a global identifier; and generating a term reordered array from the document level hash table.
-
Citations
25 Claims
-
1. A method for indexing documents, comprising:
-
generating a single document hash table in storage memory for a single document using an index construction in a multithreaded and scalable configuration wherein multiple threads are each assigned work to reduce synchronization between threads wherein generating a single document hash table includes; partitioning the single document a plurality of subparts and indexing strings of partitioned subparts of the single document to create a minor hash table for each subpart; generating a document level hash table from the minor hash tables; updating a stream level hash table for the strings which maps every string to a global identifier; and generating a term reordered array from the document level hash table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer readable storage medium comprising a computer readable program for indexing documents, wherein the computer readable program when executed on a computer causes the computer to perform the steps of:
-
generating a single document hash table in storage memory for a single document using an index construction in a multithreaded and scalable configuration wherein multiple threads are each assigned work to reduce synchronization between threads wherein generating a single document hash table includes; partitioning the single document a plurality of subparts and indexing strings of partitioned subparts of the single document to create a minor hash table for each subpart; generating a document level hash table from the minor hash tables; updating a stream level hash table for the strings which maps every string to a global identifier; and generating a term reordered array from the document level hash table. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for indexing documents in a data storage system, comprising:
-
a plurality of processing cores configured to process threads in accordance with an indexing construction program; a hierarchical memory storage architecture configured to store hash tables and processing results; and the indexing construction program configured to assign an index construction to the threads, the index construction providing a multithreaded and scalable configuration configured to generate a single document hash table for a single document wherein the threads are each assigned work to be performed by the plurality of processing cores to reduce synchronization between the threads wherein the single document is partitioned into subparts and strings of partitioned subparts of the single document are indexed to create a minor hash table for each subpart, the single document hash table including; a document level hash table generated from the minor hash tables; a stream level hash table updated for the strings which maps every string to a global identifier; and a term reordered array from the document level hash table. - View Dependent Claims (25)
-
Specification