Visually interactive identification of a cohort of data objects similar to a query based on domain knowledge
First Claim
1. A computer system comprising:
- a memory storing a database; and
a computer processor communicatively coupled to the memory and configured to;
access a plurality of data objects in the database, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features;
identify, for each data feature, a feature distribution of the numerical components associated with the data feature;
select a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for the feature distribution of the given data feature;
determine, for the query object and a data object, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object;
provide, via an interactive graphical user interface, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features;
iteratively process, based on the interactive distance histogram, selection of a sub-plurality of the data features, the selection based on domain knowledge; and
identify, based on the similarity measures, a cohort of data objects similar to the query object.
1 Assignment
0 Petitions
Accused Products
Abstract
Visually interactive identification of a cohort of similar data objects is disclosed. One example is a system including a data processor to access a plurality of data objects, each data object comprising a plurality of numerical components, where each component represents a data feature of a plurality of data features, and to identify, for each data feature, a feature distribution of the numerical components. A selector selects a sub-plurality of the data features of a query object, where a given data feature is selected if the component representing the given data feature is a peak for the feature distribution. An evaluator determines a similarity measure based on the sub-plurality of the data features. An interaction processor iteratively processes selection of a sub-plurality of the data features based on domain knowledge, and identifies, based on the similarity measures, a cohort of data objects similar to the query object.
-
Citations
15 Claims
-
1. A computer system comprising:
-
a memory storing a database; and a computer processor communicatively coupled to the memory and configured to; access a plurality of data objects in the database, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features; identify, for each data feature, a feature distribution of the numerical components associated with the data feature; select a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for the feature distribution of the given data feature; determine, for the query object and a data object, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object; provide, via an interactive graphical user interface, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features; iteratively process, based on the interactive distance histogram, selection of a sub-plurality of the data features, the selection based on domain knowledge; and identify, based on the similarity measures, a cohort of data objects similar to the query object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method to determine a similarity measure, the method comprising:
-
accessing, from a database, a plurality of data objects, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features; identifying, for each data feature, a feature distribution of the numerical components associated with the data feature; selecting a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for a feature distribution of the given data feature; determining, for the query object and a data object of the plurality of data objects, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object; providing, via an interactive graphical user interface, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features; and iteratively processing selection of the sub-plurality of the data features, the iterative selection based on at least one of adding a first data feature to the selected sub-plurality of data features and deleting a second data feature from the selected sub-plurality of data features. - View Dependent Claims (14)
-
-
15. A non-transitory computer readable medium comprising executable instructions to:
-
access, via a processor, a plurality of data objects, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features; identify, for each data feature, a feature distribution of the components associated with the data feature; select a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for the feature distribution of the given data feature; determine, for the query object and a data object of the plurality of data objects, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object; provide, to a computing device, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features; iteratively process, based on the interactive distance histogram, selection of a sub-plurality of the data features, the selection based on domain knowledge; and identify, based on the similarity measures, a cohort of data objects similar to the query object.
-
Specification