×

Partition-based high dimensional similarity join method

  • US 7,167,868 B2
  • Filed: 08/13/2003
  • Issued: 01/23/2007
  • Est. Priority Date: 09/11/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A partition-based high dimensional similarity join method, comprising:

  • determining a total number of dimensions dp for use in partitioning a high dimensional data space and a total number of partitioning dimensions;

    partitioning the high dimensional data space in accordance with the determined dimensions and the total number of partitioning dimensions;

    performing joins between data sets according to the partitioned dimensions; and

    counting a number of join computations which occur in the joins between the respective cells of the data sets,wherein the total number of dimensions dp for use in partitioning the high dimensional data space are determined based on the number of join computations, andwherein the total number of dimensions dp used in partitioning the high dimensional data space is obtained by comparing a size of the data sets and a size of disk blocks in which the data sets are stored, according to the following equation;

    d p = log

    Min (

    R

    block
    ,




    S


    block
    )
    BlockSize
    log



    1 / ɛ



    ,
    where |R|block and |S|block are a total numbers of disk blocks in which the data sets R and S are stored, respectively, the Blocksize is the size of the disk blocks, and [1/ε

    ] is a number of the cells.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×