System and method for generation of computer index files
First Claim
1. A computer-implemented method for generating an ordered data set, comprising:
- providing a data set of tuples, each tuple comprising a key and a corresponding reference;
allocating an inversion buffer in a memory;
allocating a key space in the inversion buffer for each key in the data set of tuples to be included in an ordered data set of tuples and associating that key with the key space;
allocating a reference space in each key space for each reference that corresponds to the key associated with the key space;
associating the corresponding reference with the reference space allocated for that corresponding reference; and
generating the ordered data set of the tuples based upon the positional relationship of the key spaces and the reference spaces.
8 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for the generation of computer readable indexes or other ordered lists are provided. A corpus of electronic documents or other electronic information is parsed into postings that include key and reference pairs. An inversion buffer in memory is explicitly or implicitly formatted to receive the postings in a predetermined order by key. Each key is assigned a space in the inversion buffer that is subsequently filled with references associated with the key during an inversion method. In an embodiment, an index file is generated directly from the inversion buffer, or in the case of large inversions, from a plurality of inversion buffer segments.
-
Citations
23 Claims
-
1. A computer-implemented method for generating an ordered data set, comprising:
-
providing a data set of tuples, each tuple comprising a key and a corresponding reference; allocating an inversion buffer in a memory; allocating a key space in the inversion buffer for each key in the data set of tuples to be included in an ordered data set of tuples and associating that key with the key space; allocating a reference space in each key space for each reference that corresponds to the key associated with the key space; associating the corresponding reference with the reference space allocated for that corresponding reference; and generating the ordered data set of the tuples based upon the positional relationship of the key spaces and the reference spaces. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system including for generating an inverted index, the system comprising:
-
(a) a posting component that provides a plurality of postings parsed from a document corpus, each posting comprising a term reference and an associated document reference; (b) an inversion component, comprising; (i) an inversion buffer allocation component that maps an inversion buffer in a memory to receive the plurality of postings in a pre-defined sort order; (ii) a posting sort component that associates each posting with the mapped location in the inversion buffer for that posting; and (c) an index generation component that generates the inverted index from the mapping of the inversion buffer. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method for inverting an index, the computer comprising a processor and a memory, the method comprising:
-
providing a plurality of postings to be indexed, each posting comprising a term identification and a document identification; determining a set of unique term identifications present in the plurality of postings; for each unique term identification in the set of unique term identifications, determining from the plurality of postings a set of document identifications that are associated with that unique term identification; allocating an inversion buffer in memory; allocating, in a predefined order, a term space in memory associated with the inversion buffer for each unique term identification; allocating a reference space in memory associated with the term space for each document identification that is associated with the unique term identification that is associated with the term space; for each posting in the set of postings, associating the document identification from that posting with one of the document spaces that is associated with the term space associated with the term identification of that posting; and generating the index from the spatial relationships of the term spaces associated with the inversion buffer. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer system for generating a computer-readable index, comprising:
-
(a) a document corpus parsing component, comprising; (ii) a posting component that adds a posting to a set of postings when a document in a corpus of documents contains a term, each posting comprising a term identifier and a document identifier associated with the document containing the term; (ii) a document frequency component that determines the number of postings in the set of postings that include the term identifier; (iii) a term lexicon component that administers a set of term identifiers; (b) an inversion buffer component, comprising; (i) an inversion buffer allocating component that allocates an inversion area in memory for inversion of a subset of postings from the set of postings; (ii) a term space allocating component that allocates a term space for each term identifier in a subset of terms identifiers from the term lexicon such that the term spaces are arranged by a pre-defined sort order; (iii) a reference space allocating component that allocates a reference space in the term space for the number of postings associated with the term identifier as determined by the document frequency component; (c) an inversion component, comprising; (i) a posting processing component that retrieves each posting and assigns the document identifier associated with that posting to one of the reference spaces associated with the term space that is associated with the term identifier associated with that posting; (d) an index generating component, comprising; (i) an inversion buffer conversion component that sequentially returns each term space in order and appends a term record to the index, each record associating the term identifier that is associated with the term space with the reference identifiers that have been assigned to the reference spaces associated with that term space. - View Dependent Claims (21, 22, 23)
-
Specification