DATA CLUSTERING BASED ON VARIANT TOKEN NETWORKS
First Claim
1. A method, including:
- receiving data records, the received data records each including one or more values in one or more fields; and
processing the received data records to identify one or more data clusters, the processing including;
identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields;
generating a network representing the identified tokens, with nodes of the network representing tokens and edges of the network each representing a variant relationship between tokens; and
generating a graphical representation of the network with different subsets of nodes distinguished based at least in part on values associated with nodes, where a value associated with a particular node quantifies a count of a number of instances of the token represented by that particular node appearing within the received data records.
3 Assignments
0 Petitions
Accused Products
Abstract
Received data records, each including one or more values in one or more fields, are processed to identify one or more data clusters. The processing includes: identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields; generating a network representing the identified tokens, with nodes of the network representing tokens and edges of the network each representing a variant relationship between tokens; and generating a graphical representation of the network with different subsets of nodes distinguished based at least in part on values associated with nodes, where a value associated with a particular node quantifies a count of a number of instances of the token represented by that particular node appearing within the received data records.
-
Citations
13 Claims
-
1. A method, including:
-
receiving data records, the received data records each including one or more values in one or more fields; and processing the received data records to identify one or more data clusters, the processing including; identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields; generating a network representing the identified tokens, with nodes of the network representing tokens and edges of the network each representing a variant relationship between tokens; and generating a graphical representation of the network with different subsets of nodes distinguished based at least in part on values associated with nodes, where a value associated with a particular node quantifies a count of a number of instances of the token represented by that particular node appearing within the received data records. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program stored on a computer-readable storage medium, the computer program including instructions for causing a computing system to:
-
receive data records, the received data records each including one or more values in one or more fields; and process the received data records to identify one or more data clusters, the processing including; identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields; generating a network representing the identified tokens, with nodes of the network representing tokens and edges of the network each representing a variant relationship between tokens; and generating a graphical representation of the network with different subsets of nodes distinguished based at least in part on values associated with nodes, where a value associated with a particular node quantifies a count of a number of instances of the token represented by that particular node appearing within the received data records.
-
-
12. A computing system, including:
-
an input device or port configured to receive data records, the received data records each including one or more values in one or more fields; and at least one processor configured to process the received data records to identify one or more data clusters, the processing including; identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields; generating a network representing the identified tokens, with nodes of the network representing tokens and edges of the network each representing a variant relationship between tokens; and generating a graphical representation of the network with different subsets of nodes distinguished based at least in part on values associated with nodes, where a value associated with a particular node quantifies a count of a number of instances of the token represented by that particular node appearing within the received data records.
-
-
13. A computing system, including:
-
means for receiving data records, the received data records each including one or more values in one or more fields; and means for processing the received data records to identify one or more data clusters, the processing including; identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields; generating a network representing the identified tokens, with nodes of the network representing tokens and edges of the network each representing a variant relationship between tokens; and generating a graphical representation of the network with different subsets of nodes distinguished based at least in part on values associated with nodes, where a value associated with a particular node quantifies a count of a number of instances of the token represented by that particular node appearing within the received data records.
-
Specification