System and method for generation of computer index files
First Claim
1. A computer-implemented method for generating an ordered data set, comprising:
- providing a data set of tuples, each tuple comprising a key and a corresponding reference;
determining the number of corresponding references in the data set of tuples that include each key;
allocating an inversion buffer in a memory;
allocating a key space in the inversion buffer for each key in the data set of tuples to be included in an ordered data set of tuples and associating that key with the key space;
allocating a variable number of reference spaces in the key spaces, wherein one of the reference spaces in each key space is allocated for each reference that corresponds to the key associated with the key space from the determined number of corresponding references that include each key;
associating the corresponding reference with the reference space allocated for that corresponding reference; and
generating the ordered data set of the tuples based upon the positional relationship of the key spaces and the reference spaces.
8 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for the generation of computer readable indexes or other ordered lists are provided. A corpus of electronic documents or other electronic information is parsed into postings that include key and reference pairs. An inversion buffer in memory is explicitly or implicitly formatted to receive the postings in a predetermined order by key. Each key is assigned a space in the inversion buffer that is subsequently filled with references associated with the key during an inversion method. In an embodiment, an index file is generated directly from the inversion buffer, or in the case of large inversions, from a plurality of inversion buffer segments.
-
Citations
23 Claims
-
1. A computer-implemented method for generating an ordered data set, comprising:
-
providing a data set of tuples, each tuple comprising a key and a corresponding reference; determining the number of corresponding references in the data set of tuples that include each key; allocating an inversion buffer in a memory; allocating a key space in the inversion buffer for each key in the data set of tuples to be included in an ordered data set of tuples and associating that key with the key space; allocating a variable number of reference spaces in the key spaces, wherein one of the reference spaces in each key space is allocated for each reference that corresponds to the key associated with the key space from the determined number of corresponding references that include each key; associating the corresponding reference with the reference space allocated for that corresponding reference; and generating the ordered data set of the tuples based upon the positional relationship of the key spaces and the reference spaces. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system including for generating an inverted index, the system comprising:
-
a posting component that provides a plurality of postings parsed from a document corpus, each posting comprising a term reference and an associated document reference; a document reference frequency component that determines the number of postings in the plurality of postings that include the term reference; an inversion component, comprising; an inversion buffer allocation component that maps an inversion buffer in a memory to receive the plurality of postings in a pre-defined sort order; a posting sort component that associates each posting with the mapped location in the inversion buffer for that posting; and a document reference space allocation component that allocates a variable number of reference spaces, wherein one of the reference spaces in the mapped location is allocated for each of the number of postings associated with the term reference as determined by the document reference frequency component; and an index generation component that generates the inverted index from the mapping of the inversion buffer. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented method for inverting an index, the computer comprising a processor and a memory, the method comprising:
-
providing a plurality of postings to be indexed, each posting comprising a term identification and a document identification; determining a set of unique term identifications present in the plurality of postings; for each unique term identification in the set of unique term identifications, determining from the plurality of postings the number of document identifications that are associated with that unique term identification; allocating an inversion buffer in memory; allocating, in a predefined order, a term space in memory associated with the inversion buffer for each unique term identification; allocating a variable number reference spaces in memory, wherein each reference space is associated with the term space for each document identification that is associated with the unique term identification that is associated with the term space based on the determined number of document identifications that are associated with the unique term identification; for each posting in the set of postings, associating the document identification from that posting with one of the document spaces that is associated with the term space associated with the term identification of that posting; and generating the index from the spatial relationships of the term spaces associated with the inversion buffer. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computer system for generating a computer-readable index, comprising:
-
a document corpus parsing component, comprising; a posting component that adds a posting to a set of postings when a document in a corpus of documents contains a term, each posting comprising a term identifier and a document identifier associated with the document containing the term; a document frequency component that determines the number of postings in the set of postings that include the term identifier; a term lexicon component that administers a set of term identifiers; an inversion buffer component, comprising; an inversion buffer allocating component that allocates an inversion area in memory for inversion of a subset of postings from the set of postings; a term space allocating component that allocates a term space for each term identifier in a subset of terms identifiers from the term lexicon such that the term spaces are arranged by a pre-defined sort order; a reference space allocating component that allocates a variable number of reference spaces in the term spaces, wherein the reference spaces in each term space is allocated for the number of postings associated with the term identifier as determined by the document frequency component; an inversion component, comprising; a posting processing component that retrieves each posting and assigns the document identifier associated with that posting to one of the reference spaces associated with the term space that is associated with the term identifier associated with that posting; an index generating component, comprising; an inversion buffer conversion component that sequentially returns each term space in order and appends a term record to the index, each record associating the term identifier that is associated with the term space with the reference identifiers that have been assigned to the reference spaces associated with that term space. - View Dependent Claims (21, 22, 23)
-
Specification