Computer program, method, and apparatus for detecting duplicate data
First Claim
Patent Images
1. A computer-readable recording medium containing a duplicate data detection program for detecting duplicate data out of a plurality of data each including a character string, the duplicate data detection program causing a computer to perform as:
- syntax tree construction means for creating a syntax tree by extracting a plurality of letters existing at prescribed discrete positions from the character string of each of the plurality of data; and
duplicate data detection means for searching each leaf node of the syntax tree to find some of the plurality of data that have reached the leaf node, and detecting the some of the plurality of data as possible duplicate data.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer program, method, and apparatus for narrowing data down to detect duplicate data in a short time. A computer functions as a syntax tree constructor for creating a syntax tree by extracting a plurality of letters existing at prescribed discrete positions from the character string of each of the data and a duplicate data detector for detecting some data as possible duplicate data if the data have reached a same leaf node of the syntax tree.
27 Citations
5 Claims
-
1. A computer-readable recording medium containing a duplicate data detection program for detecting duplicate data out of a plurality of data each including a character string, the duplicate data detection program causing a computer to perform as:
-
syntax tree construction means for creating a syntax tree by extracting a plurality of letters existing at prescribed discrete positions from the character string of each of the plurality of data; and duplicate data detection means for searching each leaf node of the syntax tree to find some of the plurality of data that have reached the leaf node, and detecting the some of the plurality of data as possible duplicate data. - View Dependent Claims (2, 3)
-
-
4. A method for detecting duplicate data out of a plurality of data each having a character string, comprising the steps of:
-
creating a syntax tree by extracting a plurality of letters existing at prescribed discrete positions from the character string of each of the plurality of data; searching each leaf node of the syntax tree to find some of the plurality of data that have reached the leaf node of the syntax tree; and detecting the some of the plurality of data as possible duplicate data.
-
-
5. An apparatus for detecting duplicate data out of a plurality of data each having a character string, comprising:
-
syntax tree construction means for creating a syntax tree by extracting a plurality of letters existing at prescribed discrete positions from the character string of each of the plurality of data; and duplicate data detection means for searching each leaf node of the syntax tree to find some of the plurality of data that have reached the leaf node of the syntax tree and detecting the some of the plurality of data as possible duplicate data.
-
Specification