METHOD AND APPARATUS FOR ORGANIZING DATA SOURCES
First Claim
1. A computer-implemented method of organizing data sources, comprising:
- grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items;
clustering the plurality of cliques into one or more signatures; and
for each of the one or more signatures,selecting one or more sources that are associated with a signature; and
forming the selected sources into a community.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for organizing deep Web services are provided. In one aspect, the method and apparatus obtains a collection of sources and their associated attributes and/or input modes, for instance, using a crawling algorithm. The method and apparatus uses this information to organize the sources into communities. A mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes. A clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes.
19 Citations
7 Claims
-
1. A computer-implemented method of organizing data sources, comprising:
-
grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items; clustering the plurality of cliques into one or more signatures; and for each of the one or more signatures, selecting one or more sources that are associated with a signature; and forming the selected sources into a community. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for organizing data sources, comprising:
-
a means for grouping a plurality of items including input attributes, output attributes, or keywords or combination thereof from one or more sources into a plurality of cliques of highly correlated items; and a means for clustering the plurality of cliques into one or more signatures, a means for selecting one or more sources that are associated with a signature and forming the selected sources into a community for each of the one or more signatures; and a means for constructing a graph representation of a plurality of communities, the graph representation including at least a plurality of vertices representing the plurality of communities respectively and one or more edges connecting the plurality of vertices, the one or more edges representing one or more input attributes, output attributes, or keywords or combination thereof that are shared between the communities represented in the connecting vertices.
-
Specification