Grouping and duplicate removal method in a database

US 6,381,601 B1
Filed: 12/22/1999
Issued: 04/30/2002
Est. Priority Date: 12/22/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A grouping method comprising the steps of:

determining whether there is possibility or not for existence of at least another record having the identical value of a column for grouping of the relevant record for each record by executing a predetermined first process to a value of at least a part of one or more predetermined columns for the grouping of each record of a plurality of records of a record list included in the database;

determining respectively a part of a plurality of records which is determined, by said determination, not to have possibility for existence of other records having a value of the same grouping column as a records belonging to one group; and

classifying, by a second process, a plurality of the other records other than a part of said records among a plurality of records of said record list into a plurality of groups so that a plurality of records having the identical value of said columns for grouping belong to the same group.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In order to realize high speed process for grouping the records having the identical values of one or more columns of the input list, the input list is canned, a hash value is generated using a hash function in which a value of the column as the non-vacant partial aggregation of the columns for the grouping is used as the argument, and it is determined whether two or more records having the hash values exist or not. The input list is scanned again and the calculating process of the aggregation columns is immediately executed for the records having the hash value for which it is determined by the first scanning that there is only one record having the identical hash value, the result of such calculating process is output and the records are defined as the input of the ordinary grouping process for the records which are determined to exist as the two or more records.

Citations

18 Claims

1. A grouping method comprising the steps of:
- determining whether there is possibility or not for existence of at least another record having the identical value of a column for grouping of the relevant record for each record by executing a predetermined first process to a value of at least a part of one or more predetermined columns for the grouping of each record of a plurality of records of a record list included in the database;
  
  determining respectively a part of a plurality of records which is determined, by said determination, not to have possibility for existence of other records having a value of the same grouping column as a records belonging to one group; and
  
  classifying, by a second process, a plurality of the other records other than a part of said records among a plurality of records of said record list into a plurality of groups so that a plurality of records having the identical value of said columns for grouping belong to the same group.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A grouping method according to claim 1, wherein said step of determining possibility comprises the steps of:
3. A grouping method according to claim 2, wherein said determining step comprises the steps of:
- updating, in relation to the generated hash value, counted data for identifying, each time when said hash value is generated for each record, whether the number of records which has generated a plurality of hash value to be generated by said hash function is 0 or 1 or 2 or larger;
  
  executing again said hash value generating step to a plurality of said records after said update is executed to a plurality of said records of said record list; and
  
  determining, based on said counted data, whether the number of records having generated the hash value generated again is 1 or not.
4. A grouping method according to claim 3, wherein said counted data includes two bits for counting whether the number of records having generated such hash value is 0 or 1 or 2 or larger corresponding to each value among a plurality of hash values which may be generated by said hash function.
5. A grouping method according to claim 3, wherein said counted data expresses the number of records 0, 1, 2 or larger having generated the hash value with the natural numbers such as 0×
- (3ⁱ), 1×
  
  (3ⁱ), 2×
  
  (3ⁱ) when the hash value is the i-th hash value corresponding to each value of a plurality of hash values which may be generated by said hash function and a binary number expressing the total sum of such natural numbers corresponding to a plurality of said hash values; and
  
  3ⁿis smaller than 2^2kand n>
  
  k when the total value of a plurality of said hash values generated by said hash function is n and the number of bits of said binary number is 2 k bits.
6. A grouping method according to claim 1, further comprising a step of determining whether an average group duplicate degree of said record list is near to 1 or not, and wherein the steps after said determining step are executed when the average group duplicate degree of said record list is determined to be near to 1.
7. A grouping method according to claim 2, further comprising a step of executing an aggregating process for a group to which records belong without depending on the execution result of said classifying step on the basis of each record among a plurality of records of a part respectively determined by said determining step that there is no other records having the same hash value;
- andwherein the classifying step includes a step of executing the aggregating process to each group among a plurality of groups classified by the classifying step.
8. A recording medium having recorded a computer program for executing said method of claim 1 by using a computer.

9. A duplicate removal method comprising the steps of:
- determining whether there is possibility or not for existence of at least another record having a value identical to that of a record by executing a predetermined first process to one or more column values of a plurality of records of a record list included in a database;
  
  executing a second process for deleting records other than one of the duplicate records having the identical value to a plurality of records other than a plurality of partial records determined to have no possibility for existence of the other records having the same value by said determining step among a plurality of records of said record list; and
  
  a plurality of said partial records and a plurality of said records after said deletion are output as the records after duplicate removal.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. A duplicate removal method according to claim 9, wherein said determining step for possibility comprises the steps of:
11. A duplicate removal method according to claim 10, wherein said determining step comprising the steps of:
- updating, in relation to the generated hash value, counted data for identifying, each time when said hash value is generated for each record, whether the number of records which has generated a plurality of hash value to be generated by said hash function is 0 or 1 or 2 or larger;
  
  executing again said hash value generating step to a plurality of said records after said update is executed to a plurality of said records of said record list; and
  
  determining, based on said counted data, whether the number of records having generated the hash value generated again is 1 or not.
12. A duplicate removal method according to claim 11, wherein said counted data includes two bits for counting whether the number of records having generated such hash value is 0 or 1 or 2 or larger corresponding to each value among a plurality of hash values which may be generated by said hash function.
13. A duplicate removal method according to claim 11, whereinsaid counted data expresses the number of records 0, 1, 2 or larger having generated the hash value with the natural numbers such as 0×
- (3ⁱ), 1×
  
  (3ⁱ), 2×
  
  (3ⁱ) when the hash value is the i-th hash value corresponding to each value of a plurality of hash values which may be generated by said hash function and a binary number expressing the total sum of such natural numbers corresponding to a plurality of said hash values; and
  
  3ⁿis smaller than 2^2kand n>
  
  k when the total value of a plurality of said hash values generated by said hash function is n and the number of bits of said binary number is 2 k bits.
14. A duplicate removal method claimed in claim 9, further comprising a step of determining whether the average group duplicate degree of said record list is near to 1 or not is further comprised, and wherein the steps after said step of determining the possibility are executed when the average group duplicate degree of said record list is determined to be near to 1.
15. A recording medium having recorded a computer program to execute the method of claim 9 by using a computer.

16. A grouping method comprising the steps of:
- generating hash value to each record among a plurality of records of record list included in a data base with a hash function using, as an argument, a value of at least a part of columns of one or more predetermined grouping columns of such records;
  
  determining whether a value of an effective grouping column and aggregation data are stored or not in one entry corresponding to said hash value among a plurality of entries in a prepared aggregating area;
  
  calculating the aggregation data when a record is only one record belonging to one group based on data of predetermined aggregating column of said record when said one entry does not store a value of the effective grouping column and the aggregation data;
  
  storing a value of the grouping column of said record and said calculated aggregation data in said one entry;
  
  determining whether the values of the stored grouping column is identical to the value of grouping column of each record having generated said hash value when said one entry has already stored the value of the grouping column and aggregation data;
  
  updating, when the value of said grouping column stored in one entry is identical to the value of grouping column of each record having generated said hash value, said aggregation data stored in said entry to the data obtained when each record having generated said hash value is added to the group corresponding to said one entry based on the data of said aggregation column of each record having generated said hash value;
  
  storing the data of grouping column of each record and data of aggregation column as the records not grouped when the value of the grouping column stored in said entry is different from the value of the grouping column of each record having generated said hash value;
  
  outputting, to a plurality of records of said record list, data stored in a plurality of entries of said aggregating area as data in regard to one group after execution of the steps from said generating step to said registering step; and
  
  changing a hash function used in said generating step during the repetition of the steps by repeating the steps from said generating step to said output step to a plurality of records stored as the records not grouped in said storing step until the grouping is made for all records.
- View Dependent Claims (17, 18)
- - 17. A grouping method according to claim 16, wherein said storing step includes a step for determining one of a plurality of divided lists to which said each record belongs depending on the value of said grouping column of said each record and storing said each record as the record belonging to one divided list and said repeating step includes a step for sequentially selecting a plurality of said divided lists and executing the steps from said generating step to said output step to a plurality of records stored in the selected divided list.
  - 18. A recording medium having recorded a computer program to execute the method of claim 16 by using a computer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hitachi, Ltd.
Original Assignee
Hitachi, Ltd.
Inventors
Fujiwara, Shinji, Ushijima, Kazutomo, Nishizawa, Itaru
Primary Examiner(s)
Von Buhr, Maria N.

Application Number

US09/469,615
Time in Patent Office

860 Days
Field of Search

707/7, 711/216
US Class Current

707/692
CPC Class Codes

G06F 16/24556   Aggregation; Duplicate elim...

Y10S 707/954   Relational

Y10S 707/99937   Sorting

Grouping and duplicate removal method in a database

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Grouping and duplicate removal method in a database

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links