Fuzzy data operations
First Claim
Patent Images
1. A method for clustering data elements stored in a data storage system, the method including:
- reading data elements from the data storage system;
forming clusters of data elements with each data element being a member of at least one cluster and with assignment of a given data element as a member of a given cluster being based at least in part on contents of one or more objects in the given data element;
associating at least one data element with two or more clusters, with memberships of the data element belonging to respective ones of the two or more clusters being represented by a measure of ambiguity that quantifies partial membership of a given data element in a given cluster based at least in part on a comparison between contents of the given data element and contents of at least one other data element in the given cluster, the contents of the given data element including at least one of;
one or more of words in one or more objects in the given data element, and variants of the one or more words;
storing information in the data storage system to represent the formed clusters; and
performing a data operation that uses values of the measure of ambiguity representing the memberships;
wherein the data operation includes a rollup that calculates a weighted subtotal of a quantity of the one or more clusters, the quantity being associated with the data element, and the subtotal being calculated by summing the products of the value of the quantity associated with each of the data elements in the first cluster and the respective value of the measure of ambiguity representing the membership of the data elements in the first cluster.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for clustering data elements stored in a data storage system includes reading data elements from the data storage system. Clusters of data elements are formed with each data element being a member of at least one cluster. At least one data element is associated with two or more clusters. Membership of the data element belonging to respective ones of the two or more clusters is represented by a measure of ambiguity. Information is stored in the data storage system to represent the formed clusters.
-
Citations
61 Claims
-
1. A method for clustering data elements stored in a data storage system, the method including:
-
reading data elements from the data storage system; forming clusters of data elements with each data element being a member of at least one cluster and with assignment of a given data element as a member of a given cluster being based at least in part on contents of one or more objects in the given data element; associating at least one data element with two or more clusters, with memberships of the data element belonging to respective ones of the two or more clusters being represented by a measure of ambiguity that quantifies partial membership of a given data element in a given cluster based at least in part on a comparison between contents of the given data element and contents of at least one other data element in the given cluster, the contents of the given data element including at least one of;
one or more of words in one or more objects in the given data element, and variants of the one or more words;storing information in the data storage system to represent the formed clusters; and performing a data operation that uses values of the measure of ambiguity representing the memberships; wherein the data operation includes a rollup that calculates a weighted subtotal of a quantity of the one or more clusters, the quantity being associated with the data element, and the subtotal being calculated by summing the products of the value of the quantity associated with each of the data elements in the first cluster and the respective value of the measure of ambiguity representing the membership of the data elements in the first cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
16. A system for clustering data elements stored in a data storage system, the system including:
-
means for reading data elements from the data storage system; means for forming clusters of data elements with each data element being a member of at least one cluster; means for associating at least one data element with two or more clusters, with memberships of the data element belonging to respective ones of the two or more clusters being represented by a measure of ambiguity; means for storing information in the data storage system to represent the formed clusters; and means for performing a data operation that uses values of the measure of ambiguity representing the memberships; wherein the data operation includes a rollup that calculates a weighted subtotal of a quantity of the one or more clusters, the quantity being associated with the data element, and the subtotal being calculated by summing the products of the value of the quantity associated with each of the data elements in the first cluster and the respective value of the measure of ambiguity representing the membership of the data elements in the first cluster.
-
-
31. A computer-readable storage device storing a computer program for clustering data elements stored in a data storage system, the computer program including instructions for causing a computer to:
-
read data elements from the data storage system; form clusters of data elements with each data element being a member of at least one cluster and with assignment of a given data element as a member of a given cluster being based at least in part on contents of one or more objects in the given data element; associate at least one data element with two or more clusters, with memberships of the data element belonging to respective ones of the two or more clusters being represented by a measure of ambiguity that quantifies partial membership of a given data element in a given cluster based at least in part on a comparison between contents of the given data element and contents of at least one other data element in the given cluster, the contents of the given data element including at least one of;
one or more of words in one or more objects in the given data element, and variants of the one or more words; andstore information in the data storage system to represent the formed clusters; and perform a data operation that uses values of the measure of ambiguity representing the memberships wherein the data operation includes a rollup that calculates a weighted subtotal of a quantity of the one or more clusters, the quantity being associated with the data element, and the subtotal being calculated by summing the products of the value of the quantity associated with each of the data elements in the first cluster and the respective value of the measure of ambiguity representing the membership of the data elements in the first cluster. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A system for clustering data elements stored in a data storage system, the system including:
-
an input device configured to read data elements from the data storage system; at least one process configured to cluster data elements stored in a data storage system, the processing including forming clusters of data elements with each data element being a member of at least one cluster and with assignment of a given data element as a member of a given cluster being based at least in part on contents of one or more objects in the given data element; associating at least one data element with two or more clusters, with memberships of the data element belonging to respective ones of the two or more clusters being represented by a measure of ambiguity that quantifies partial membership of a given data element in a given cluster based at least in part on a comparison between contents of the given data element and contents of at least one other data element in the given cluster, the contents of the given data element including at least one of;
one or more of words in one or more objects in the given data element, and variants of the one or more words; andstoring information in the data storage system to represent the formed clusters; and performing a data operation that uses values of the measure of ambiguity representing the memberships wherein the data operation includes a rollup that calculates a weighted subtotal of a quantity of the one or more clusters, the quantity being associated with the data element, and the subtotal being calculated by summing the products of the value of the quantity associated with each of the data elements in the first cluster and the respective value of the measure of ambiguity representing the membership of the data elements in the first cluster. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61)
-
Specification