Visually interactive identification of a cohort of data objects similar to a query based on domain knowledge

US 10,509,800 B2
Filed: 01/23/2015
Issued: 12/17/2019
Est. Priority Date: 01/23/2015
Status: Active Grant

First Claim

Patent Images

1. A computer system comprising:

a memory storing a database; and

a computer processor communicatively coupled to the memory and configured to;

access a plurality of data objects in the database, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features;

identify, for each data feature, a feature distribution of the numerical components associated with the data feature;

select a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for the feature distribution of the given data feature;

determine, for the query object and a data object, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object;

provide, via an interactive graphical user interface, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features;

iteratively process, based on the interactive distance histogram, selection of a sub-plurality of the data features, the selection based on domain knowledge; and

identify, based on the similarity measures, a cohort of data objects similar to the query object.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Visually interactive identification of a cohort of similar data objects is disclosed. One example is a system including a data processor to access a plurality of data objects, each data object comprising a plurality of numerical components, where each component represents a data feature of a plurality of data features, and to identify, for each data feature, a feature distribution of the numerical components. A selector selects a sub-plurality of the data features of a query object, where a given data feature is selected if the component representing the given data feature is a peak for the feature distribution. An evaluator determines a similarity measure based on the sub-plurality of the data features. An interaction processor iteratively processes selection of a sub-plurality of the data features based on domain knowledge, and identifies, based on the similarity measures, a cohort of data objects similar to the query object.

Citations

15 Claims

1. A computer system comprising:
- a memory storing a database; and
  
  a computer processor communicatively coupled to the memory and configured to;
  
  access a plurality of data objects in the database, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features;
  
  identify, for each data feature, a feature distribution of the numerical components associated with the data feature;
  
  select a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for the feature distribution of the given data feature;
  
  determine, for the query object and a data object, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object;
  
  provide, via an interactive graphical user interface, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features;
  
  iteratively process, based on the interactive distance histogram, selection of a sub-plurality of the data features, the selection based on domain knowledge; and
  
  identify, based on the similarity measures, a cohort of data objects similar to the query object.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The computer system of claim 1, wherein the interactive distance histogram is a distance histogram with equal depth.
  - 3. The computer system of claim 2, wherein the computer processor is further configured to detect a peak in the distance histogram by identifying a bin with a small width.
  - 4. The computer system of claim 3, wherein the computer processor is further configured to determine a component score for each data feature of the sub-plurality of the data features, the component score based on a position of a detected peak in the distance histogram.
  - 5. The computer system of claim 4, wherein the computer processor is further configured to rank the data features of the sub-plurality of data features based on the component scores.
  - 6. The computer system of claim 1, wherein the similarity measure is a dimension interestingness measure for higher dimensional data objects.
  - 7. The computer system of claim 1, wherein the similarity measure is a Euclidean distance distribution between the query object and the plurality of data objects, the Euclidean distance distribution indicative of data objects similar to the query object for the given data feature.
  - 8. The computer system of claim 7, wherein the computer processor is further configured to provide, via the interactive graphical user interface, a visual representation of the Euclidean distance distribution.
  - 9. The computer system of claim 8, wherein the computer processor is further configured to provide, via the graphical user interface, an adjustable slider to adjust a threshold for the Euclidean distance distribution, and further identify the cohort of data objects based on the adjusted threshold.
  - 10. The computer system of claim 7, wherein the computer processor is further configured to determine, based on the Euclidean distance distribution, a distance distribution attribute for the given data feature of the query object.
  - 11. The computer system of claim 10, wherein the computer processor is further configured to provide for display a graphical representation of the distance distribution attribute, wherein the horizontal axis represents a normalized distance distribution attribute, and the vertical axis represents a number of the data objects similar to the query object.
  - 12. The computer system of claim 1, wherein the iterative selection includes at least one of adding a first data feature to the sub-plurality of data features and deleting a second data feature from the sub-plurality of data features.

13. A method to determine a similarity measure, the method comprising:
- accessing, from a database, a plurality of data objects, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features;
  
  identifying, for each data feature, a feature distribution of the numerical components associated with the data feature;
  
  selecting a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for a feature distribution of the given data feature;
  
  determining, for the query object and a data object of the plurality of data objects, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object;
  
  providing, via an interactive graphical user interface, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features; and
  
  iteratively processing selection of the sub-plurality of the data features, the iterative selection based on at least one of adding a first data feature to the selected sub-plurality of data features and deleting a second data feature from the selected sub-plurality of data features.
- View Dependent Claims (14)
- - 14. The method of claim 13, further comprising identifying, based on the similarity measures, a cohort of data objects similar to the query object wherein the cohort of data objects are selected from the plurality of data objects.

15. A non-transitory computer readable medium comprising executable instructions to:
- access, via a processor, a plurality of data objects, each data object comprising a plurality of numerical components, wherein each component represents a data feature of a plurality of data features;
  
  identify, for each data feature, a feature distribution of the components associated with the data feature;
  
  select a sub-plurality of data features of a query object, wherein a given data feature is selected if the component representing the given data feature is a peak for the feature distribution of the given data feature;
  
  determine, for the query object and a data object of the plurality of data objects, a similarity measure based on the sub-plurality of the data features, the similarity measure indicative of data features common to the query object and the data object;
  
  provide, to a computing device, an interactive visual representation of a distance histogram representing the feature distributions of the plurality of data features;
  
  iteratively process, based on the interactive distance histogram, selection of a sub-plurality of the data features, the selection based on domain knowledge; and
  
  identify, based on the similarity measures, a cohort of data objects similar to the query object.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Inventors
Hao, Ming C, Lee, Wei-Nchih, Chang, Nelson L, Hund, Michael, Keim, Daniel
Primary Examiner(s)
Cao, Phuong Thao

Application Number

US15/519,734
Publication Number

US 20170316071A1
Time in Patent Office

1,789 Days
Field of Search

707722
US Class Current
CPC Class Codes

G06F 16/2465   Query processing support fo...

G06F 16/248   Presentation of query results

G06F 16/26   Visual data mining; Browsin...

G06F 3/04847   Interaction techniques to c...

Visually interactive identification of a cohort of data objects similar to a query based on domain knowledge

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Visually interactive identification of a cohort of data objects similar to a query based on domain knowledge

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links