×

Method for representing and storing hierarchical data in a columnar format

  • US 9,087,138 B2
  • Filed: 01/15/2013
  • Issued: 07/21/2015
  • Est. Priority Date: 01/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method of organizing hierarchical data in a columnar form to be stored in a computer database or transmitted over networks, said method comprising:

  • using a computer processor to create a fully expanded tree-like metadata schema for a type of hierarchical data;

    wherein said schema is a fully expanded tree-like structure comprising record nodes and field nodes with one root node, wherein;

    a;

    a record node or a field node is defined and used only once in said tree structure;

    b;

    each node is defined by at least a name, a unique ID, and a type identifier;

    c;

    said unique ID is unique among all the nodes in said fully expanded tree-like structure, and said unique ID uniquely identifies the full path from the said root node to the said node in said fully expanded tree-like structure;

    d;

    said type identifier for a record node defines the content model of said record as either “

    sequence”

    or “

    choice”

    , wherein the default value is “

    sequence”

    , and said type identifier for a field node specifies the data type of the scalar value of said field node;

    e;

    each node has two optional attributes;

    maxOccurs and minOccurs, wherein the default value for both attributes is “

    1”

    , wherein a value greater than “

    1”

    for maxOccurs indicates that the node is repeatable, and a value of “

    0”

    for minOccurs indicates that the node is optional;

    f;

    each field node has optional attributes to specify the value constraints; and

    g;

    each node further has an optional attribute pseudo to indicate said node is for content grouping or content presentation purposes;

    using a computer processor to receive a plurality of instance data of said type of hierarchical data;

    said instance data comprising a plurality of data entries organized in a hierarchical relationship;

    matching each instance data against said schema and producing a plurality of columns of data, wherein there exists two types of columns, value columns and occurrence columns;

    wherein each said instance data is matched against said schema in a chosen tree traversal order;

    for each field node in said schema, allocating a value column in said columnar form and storing the scalar values for all matched data entries in said value column, and identifying said value column by the unique ID of said field node;

    for each repeatable, optional or choosable field and record node in said schema, additionally allocating an occurrence column in said columnar form, storing occurrences in said occurrence column, and identifying said occurrence column by the unique ID of said respective field and record node;

    discarding other matched data entries; and

    each value column comprises an array of scalar values of same data type for a field in said schema;

    each occurrence column comprises an array of occurrence numbers for a node in said schema, wherein said node is repeatable, optional or choosable;

    wherein each occurrence number in an occurrence column indicates the total number of occurrences of the node under a single occurrence of the node'"'"'s parent node; and

    said hierarchical relationship among said data entries is jointly preserved by said schema and said occurrence numbers in said occurrence columns;

    storing said columns of data in a computer database;

    orserializing said columns of data into a stream of bytes of data;

    orperforming at least one of a query, update, insertion, or deletion on said hierarchical data as stored in said columnar form.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×