Web-scale data processing system and method
First Claim
Patent Images
1. A method executing on at least one computing system for merging records from a plurality of conflicting namespaces into a globally-unified namespace, the method comprising:
- performing steps a-d for an initial iteration level; and
performing at least steps a-c for a next iteration level;
a. obtaining a first plurality of source columns associated with a current iteration level, each source column including a plurality of records with source-column-unique, but not first-plurality-unique, identifiers;
b. grouping said first plurality of source columns into a current merge-group set comprising at least two merge groups for said initial iteration level and at least one merge group for said next iteration level, each merge group including at least two source columns;
c. for each merge group in said current merge-group set;
i. sorting and merging records from said at least two source columns in the current merge group into a merged column having a plurality of records with merged-column-unique identifiers;
ii. storing the current merged column; and
iii. for each of said at least two source columns in the current merge group, creating a translation table associated with the current iteration-level identifier, said translation table comprising a source entry for each source-column-unique record identifier in the current source column and a destination entry for each corresponding merged-column-unique record identifier in the current merged column;
d. associating the at least two current merged columns with said next iteration level.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for obtaining and processing web-scale data are provided herein. More particularly, a web-scale data processing system and method for crawling, storing, processing, encoding, and/or serving web-scale data are disclosed.
-
Citations
21 Claims
-
1. A method executing on at least one computing system for merging records from a plurality of conflicting namespaces into a globally-unified namespace, the method comprising:
-
performing steps a-d for an initial iteration level; and performing at least steps a-c for a next iteration level; a. obtaining a first plurality of source columns associated with a current iteration level, each source column including a plurality of records with source-column-unique, but not first-plurality-unique, identifiers; b. grouping said first plurality of source columns into a current merge-group set comprising at least two merge groups for said initial iteration level and at least one merge group for said next iteration level, each merge group including at least two source columns; c. for each merge group in said current merge-group set; i. sorting and merging records from said at least two source columns in the current merge group into a merged column having a plurality of records with merged-column-unique identifiers; ii. storing the current merged column; and iii. for each of said at least two source columns in the current merge group, creating a translation table associated with the current iteration-level identifier, said translation table comprising a source entry for each source-column-unique record identifier in the current source column and a destination entry for each corresponding merged-column-unique record identifier in the current merged column; d. associating the at least two current merged columns with said next iteration level. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing apparatus comprising a processor and a memory, the memory storing instructions that, when executed by the processor, perform a method for merging records from a plurality of conflicting namespaces into a globally-unified namespace, the method comprising:
-
performing steps a-d for an initial iteration level; and performing at least steps a-c for a next iteration level; a. obtaining a first plurality of source columns associated with a current iteration level, each source column including a plurality of records with source-column-unique, but not first-plurality-unique, identifiers; b. grouping said first plurality of source columns into a current merge-group set comprising at least two merge groups for said initial iteration level and at least one merge group for said next iteration level, each merge group including at least two source columns; c. for each merge group in said current merge-group set; i. sorting and merging records from said at least two source columns in the current merge group into a merged column having a plurality of records with merged-column-unique identifiers; ii. storing the current merged column; and iii. for each of said at least two source columns in the current merge group, creating a translation table associated with the current iteration-level identifier, said translation table comprising a source entry for each source-column-unique record identifier in the current source column and a destination entry for each corresponding merged-column-unique record identifier in the current merged column; d. associating the at least two current merged columns with said next iteration level. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage medium having stored thereon instructions that, when executed by at least one processor, perform a method for merging records from a plurality of conflicting namespaces into a globally-unified namespace, the method comprising:
-
performing steps a-d for an initial iteration level; and performing at least steps a-c for a next iteration level; a. obtaining a first plurality of source columns associated with a current iteration level, each source column including a plurality of records with source-column-unique, but not first-plurality-unique, identifiers; b. grouping said first plurality of source columns into a current merge-group set comprising at least two merge groups for said initial iteration level and at least one merge group for said next iteration level, each merge group including at least two source columns; c. for each merge group in said current merge-group set; i. sorting and merging records from said at least two source columns in the current merge group into a merged column having a plurality of records with merged-column-unique identifiers; ii. storing the current merged column; and iii. for each of said at least two source columns in the current merge group, creating a translation table associated with the current iteration-level identifier, said translation table comprising a source entry for each source-column-unique record identifier in the current source column and a destination entry for each corresponding merged-column-unique record identifier in the current merged column; d. associating the at least two current merged columns with said next iteration level. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification