Privacy compliant multiple dataset correlation system
First Claim
1. A market data acquisition system, comprising:
- a means for retrieving event and embedded content data from a plurality of set-top boxes;
a means for retrieving content attributes from a content attribute database;
a means for correlating retrieved set-top box event data with content attributes to produce data indicating which content was experienced through the plurality of set-top boxes;
a means for retrieving demographic information from a demographic information database; and
a means for correlating demographic information to data indicating which content was experienced through the plurality of set-top boxes to produce, in response to a query, data indicating content experienced by a demographic group or set of demographic groups.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for using inverse mathematical principles in the analysis of compatible datasets so that correlations and trends within and between said datasets can be uncovered. The present invention is tailored to the analysis of datasets that are extremely large; result from passive, privacy-secure, or anonymous, data collection; and are relatively unbiased. Correlations and trends uncovered by such analysis can be further examined by data mining and prediction portions of the present invention, which uncover and make use of interrelated rules that determine data structures. An embodiment directed toward analysis of television viewership and marketing data that does this while still respecting privacy concerns is disclosed. In a preferred embodiment, a satellite, internet, cable, or other content provider can provide a viewer with a set-top box which may be specially instrumented to allow monitoring, recording, and transmission of set-top box events. While the analysis of television viewership and marketing data is presently preferred, it will be apparent to one skilled in the art that the system and method herein can be employed to other data collection and data analysis scenarios. Other contemplated embodiments include, but are not limited to, privacy-secure actuarial analysis, radio and Internet market data collection, and even consumer behavioral predictions for advanced marketing techniques.
137 Citations
80 Claims
-
1. A market data acquisition system, comprising:
-
a means for retrieving event and embedded content data from a plurality of set-top boxes;
a means for retrieving content attributes from a content attribute database;
a means for correlating retrieved set-top box event data with content attributes to produce data indicating which content was experienced through the plurality of set-top boxes;
a means for retrieving demographic information from a demographic information database; and
a means for correlating demographic information to data indicating which content was experienced through the plurality of set-top boxes to produce, in response to a query, data indicating content experienced by a demographic group or set of demographic groups. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of correlating dynamic and static datasets sharing at least one common characteristic and having an assumed relationship, and using such correlations to determine rule systems between the sets, comprising the steps of:
-
selecting subsets of said datasets sharing a common characteristic;
expressing the assumed relationship as a mathematical assumption;
defining an error function which describes the two datasets in terms of said mathematical assumption;
performing fitting procedures to account for errors in the assumed relationship; and
performing fitting procedures which account for errors in the definition of the common subsets. - View Dependent Claims (21, 22, 23, 26, 27, 28, 29, 71)
-
-
24. A method of testing assumptions pertaining to relationships between two disparate datasets sharing at least one common aspect, comprising the steps of:
-
entering such assumptions through a user interface;
selecting sample data from a first dataset;
determining correlations between said selected data and data stored in a second dataset; and
establishing assumption validity based on such correlations.
-
-
25. A method of determining individual characteristics by correlating dynamic and static datasets sharing at least one common characteristic and having an assumed relationship, comprising the steps of:
-
selecting subsets of said datasets sharing a common characteristic;
expressing the assumed relationship as a mathematical assumption;
defining an error function which describes the two datasets in terms of said mathematical assumption;
performing fitting procedures to account for errors in the assumed relationship;
storing such correlations in an individual-specific array; and
iteratively repeating this process.
-
-
30. A method of dynamically determining the demographic identity of an individual operating a set-top box, comprising the steps of:
-
monitoring set-top box events for a plurality of set-top boxes;
correlating set-top box events with demographic characteristics;
applying IDM calculation techniques to determine probabilities for demographic characteristic and set-top box event dataset correlations;
ascribing demographic characteristic probabilities to each set-top box over time based on observed set-top box events and their relationship to such IDM probabilities;
evaluating such ascribed demographic characteristic probabilities over time through statistical analysis;
fitting probabilities ascribed to demographic characteristics to statistically determine the most likely set of constant dataset possibilities for each set-top box; and
,fitting set-top box possibility sets to IDM probability sets for a set-top box event. - View Dependent Claims (31)
-
-
32. A system for directing content to a specific demographic group, comprising:
-
an array identifying demographic identities associated with set-top boxes;
a means for entering a demographic group to be targeted;
a means for entering the content, or a reference to such content, to be directed to a demographic group;
a means for entering times and other properties indicating a preferred content delivery method; and
a means of delivering content to a set-top box corresponding to requested demographic information. - View Dependent Claims (33, 35, 37, 39)
-
-
34. A system for directing content to set-top boxes exhibiting a behavior or pattern of behaviors when a specified content type is presented, comprising:
-
a set of set-top box events with specific time recordings for each event;
a set of content properties;
a means for correlating set-top box events to content properties;
a means for entering desired set-top box event/content property correlations;
a means for delivering content to those set-top boxes corresponding to said set-top box event/content property correlations.
-
-
36. A method of determining the effect of content attributes on content ratings, comprising the steps of:
-
obtaining content attributes from embedded content information or from external sources;
recording set-top box events as content is experienced;
correlating set-top box events to content attributes; and
,analyzing such correlations over time to determine the effect of content attributes on content ratings.
-
-
38. A method of determining the effect of content attributes on content ratings for a specific demographic group, comprising the steps of:
-
obtaining content attributes from embedded content information or from external sources, recording set-top box events as content is experienced;
correlating set-top box events to content attributes;
correlating set-top box events and content attributes to demographic characteristics for each set-top box; and
analyzing such correlations over time to determine the effect of content attributes on content ratings for specific demographic groups.
-
-
40. A method of creating new content based on previously experienced content and content ratings, comprising the steps of:
-
obtaining content attributes from embedded content information or from external sources;
recording set-top box events as content is experienced;
correlating set-top box events to content attributes;
analyzing such correlations over time to determine the effect of content attributes on content ratings; and
analyzing the effect of content attribute order on content ratings; and
determining a preferred content attribute set and content attribute presentation order. - View Dependent Claims (41)
-
-
42. A method of creating new content based on previously experienced content and content ratings, where such new content is directed toward a demographic group, comprising the steps of:
-
obtaining content attributes from embedded content information or from external sources;
recording set-top box events as content is experienced;
correlating set-top box events to content attributes;
correlating set-top box events and content attributes to demographic characteristics;
analyzing such correlations over time to determine the effect of content attributes on content ratings for a given demographic group; and
analyzing the effect of content attribute order on content ratings for a given demographic group; and
determining a preferred content attribute set and content attribute presentation order for a given demographic group.
-
-
43. A system for predicting future events based on a proposed dataset, consisting of:
-
a dataset of past events;
a known dataset sharing at least one attribute with said dataset of past events, and with substantially similar attributes to said proposed dataset;
a means of correlating said dataset of past events with said known dataset to form a new dataset; and
,a means of correlating said new dataset to said proposed dataset. - View Dependent Claims (44, 45, 46, 47, 49, 50, 51, 52)
-
-
48. A method of predicting future events given a proposed dataset, comprising the steps of:
-
monitoring past events;
correlating said past events with a dataset sharing at least one attribute with said past events, and with a substantially similar structure to the proposed dataset, the results of such are stored in an array;
correlating said array with said proposed dataset; and
reporting the results of said array/proposed dataset correlations as a prediction of future events.
-
-
53. A system of predicting future events for a given demographic segment, comprising:
-
a dataset of past events;
a demographic dataset sharing at least one attribute with said dataset of past events;
a means of correlating said dataset of past events with said demographic dataset and storing the result in an array;
a known dataset sharing at least one attribute with said demographic dataset, and with substantially similar attributes to said proposed dataset;
a means of correlating said array with said known dataset to form a new dataset; and
,a means of correlating said new dataset to said proposed dataset. - View Dependent Claims (54, 55, 56, 57, 59, 60, 61, 62)
-
-
58. A method of predicting future events for a given demographic based on a proposed dataset, comprising the steps of:
-
monitoring past events;
correlating said past events with a demographic dataset and storing the result in an array;
correlating said array with a dataset sharing at least one attribute with said array, and with a substantially similar structure to the proposed dataset, the results of such are stored in an additional dataset;
correlating said additional dataset with said proposed dataset; and
reporting the results of such correlations as a prediction of future events.
-
-
63. A privacy-compliant data collection and data correlation system comprising:
-
a means of collecting individual-specific behavior data without knowing individual-specific demographic information pertaining to the individual about whom such data is collected;
a means of accessing demographic data for the region in which the individual resides; and
a means of correlating such individual-specific data with such demographic data to determine the demographic identity of each individual about whom data is collected. - View Dependent Claims (64)
-
-
65. A method of predicting behaviors of non-sampled demographic specifications based on sampled demographic specifications of a given level comprising the steps of:
-
monitoring past behavior and correlating such behavior with demographic characteristics monitored;
breaking a non-sampled demographic specification into sub-specifications for which sample data has been collected;
establishing the statistical effects of various rules on each sub-specification and those characterizations comprising them; and
statistically predicting non-sample behaviors based on such effects. - View Dependent Claims (66, 68, 69, 70)
-
-
67. A method of reducing the effect of sampling error and sample bias on data correlations determined between a dynamic dataset and a static dataset based on assumptions about the relationships between such data, such as:
-
creating equations to express such assumptions;
determining error functions which can assist in calculating values for each unknown variable in such equations;
creating a transformable matrix based on such functions;
inverting said matrix to apply a least-squares approach fitting method to the underlying data;
normalizing the results of said least-squares fit;
calculating Pearson-r correlations for such normalized results;
calculating aspect representation indices for each subset of data within said static dataset;
determining assumption validities for assumptions used as a basis for this process; and
combining said correlations, said aspect representation indices, and said assumption validities to create a set of data correlations and corresponding confidence intervals.
-
-
72. A method of increasing correlation result dataset specificity by reducing possibilities, consisting of the steps:
-
calculating correlation result dataset characterization values which fall within a predetermined confidence limit using aspect representation indices, inverse demographic matrices, recombination matrices, and specification similarity matrices;
creating a matrix of such values for all demographic characterizations for each method used;
utilizing mathematical expressions of the requirement of consistency for distinct value ranges for identical characterizations in the separate matrices, reducing each range for a given characterization to the greatest possible extent within a predetermined confidence interval;
thus producing one matrix with one value range for each characterization;
possibly transforming value ranges for all characterizations within said matrix to the same statistical confidence;
iteratively reducing all ranges to the greatest possible extent by utilizing both mathematical expressions of the requirement of consistency among all value ranges in said matrix as well as constraints given by actual characterization population numbers; and
adjusting the statistical confidence if necessary to allow for further value range reduction past the point of useful iteration at a previous statistical confidence. - View Dependent Claims (73, 74, 75)
-
-
76. A method of fitting by convergence and similarity between a static dataset and a dynamic dataset, comprising the steps of:
-
defining subsets of each dataset;
determining correlations between such datasets;
performing a time-based analysis of group representations and additional correlations within said correlations;
assigning weights to such representations and additional correlations; and
,applying such weights and values to determine undefined correlation dataset values. - View Dependent Claims (77, 78, 79)
-
-
80. A method of invalidating set-top box events, comprising the steps of:
-
monitoring set-top box events;
storing such events in an array;
calculating trends in such events;
invalidating set-top box events which deviate in a statistically significant manner from observed set-top box event trends, or which match previously defined invalid set-top box events;
placing such invalidated set-top box events in an array; and
calculating trends in such invalidated set-top box events such that some long-term trends may be revalidated, and to identify new set-top box event categories to be ignored.
-
Specification