Finding data in connected corpuses using examples
First Claim
1. A data processing system, comprising:
- a processor; and
a memory coupled to the processor, the memory configured to store program instructions executable by the processor to cause the data processing system to;
receive a collection of values from a user;
identify a data type for each of the values;
identify distinct datasets that correspond to the data types, each of the distinct datasets data set having one or more of the data types;
identify relationships among the distinct datasets, the relationships corresponding to links between similar data types in the distinct datasets;
provide, to the user, a list of proposed groups of datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships;
receive an example value set from the user, the example value set corresponding to a known relationship between two or more data types, and the example value set including at least two values;
re-interpret the one or more relationships based upon the example value set; and
provide, to the user, a second list of a second proposed group of datasets based upon the re-interpretation, wherein the datasets within the second proposed group include the example value set.
2 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, datasets are stored in a catalog. The datasets are enriched by establishing relationships among the domains in different datasets. A user searches for relevant datasets by providing examples of the domains of interest. The system identifies datasets corresponding to the user-provided examples. The system them identifies connected subsets of the datasets that are directly linked or indirectly linked through other domains. The user provides known relationship examples to filter the connected subsets and to identify the connected subsets that are most relevant to the user'"'"'s query. The selected connected subsets may be further analyzed by business intelligence/analytics to create pivot tables or to process the data.
-
Citations
18 Claims
-
1. A data processing system, comprising:
-
a processor; and a memory coupled to the processor, the memory configured to store program instructions executable by the processor to cause the data processing system to; receive a collection of values from a user; identify a data type for each of the values; identify distinct datasets that correspond to the data types, each of the distinct datasets data set having one or more of the data types; identify relationships among the distinct datasets, the relationships corresponding to links between similar data types in the distinct datasets; provide, to the user, a list of proposed groups of datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships; receive an example value set from the user, the example value set corresponding to a known relationship between two or more data types, and the example value set including at least two values; re-interpret the one or more relationships based upon the example value set; and provide, to the user, a second list of a second proposed group of datasets based upon the re-interpretation, wherein the datasets within the second proposed group include the example value set. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
performing, by a processor in a computer system; identifying a collection of domains corresponding to a collection of values; identifying distinct datasets corresponding to at least one of the domains; identifying relationships among the distinct datasets, the relationships corresponding to links between similar domains in the distinct datasets; identifying groups of datasets among the distinct datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships; receiving an example value set, the example value set corresponding to a known relationship between two or more domains, the example value set including at least two values; re-interpreting the one or more relationships based upon the example value set; and identifying one or more proposed groups of datasets based upon the re-interpretation, wherein the at least two values of the example value set are found within the datasets of the proposed groups. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. An article of manufacture having computer-executable instructions stored thereon that, upon execution by at least one processor of a computer system, cause the computer system to:
-
identify a collection of data types corresponding to a collection of values; identify distinct datasets corresponding to at least one of the data types; identify relationships among the distinct datasets, the relationships corresponding to links between similar data types in the distinct datasets; identify groups of datasets among the distinct datasets, wherein the datasets within each proposed group are linked to each other through one or more relationships; receive an example value set from the user, the example value set corresponding to a known relationship between two or more data types, the example value set including at least two values; re-interpret the one or more relationships based upon the example value set; and identify one or more proposed groups of datasets based upon the re-interpretation, wherein the at least two values of the example value set are found within the datasets of the proposed groups. - View Dependent Claims (18)
-
Specification