Method and system for business intelligence analytics on unstructured data
First Claim
1. A machine-implemented method for a pipelined process of capture, classification and dimensioning of data from a plurality of data sources that include unstructured data having no explicit dimensions associated with the unstructured data to generate a domain-relevant classified data index that is useable by a plurality of different intelligence metrics to perform different kinds of business intelligence analytics, the method comprising:
- using a data processing machine to collect ingested data as one or more documents from each of the plurality of data sources that include unstructured data and automatically generate and store an ingested data index representing the ingested data that includes at least a hyperlink and extracted meta data for each document;
using a data processing machine to automatically classify each of the one or more documents into one or more relevance classifications that are stored with the ingested data index for that document to form a domain-relevant classified data index representing the ingested data, wherein the relevance classifications are based on a plurality of dynamically generated topics that are generated in response to machine analysis that includes machine-defined classifiers and in response to user input that includes user-defined named-entities and user-defined keywords; and
using a data processing machine to automatically process the plurality of data sources with a plurality of different intelligence metric modules independent of and after the one or more documents have been initially ingested and classified by utilizing the domain-relevant classified data index to generate analytics results that are presented for a user, including processing at least one of the documents in the ingested data with each intelligence metric module based upon a plurality of dimensions abstracted from the relevance classifications and the extracted metadata that includes at least one implicit dimension derived from one or more of the user-defined named-entities,wherein the intelligence metric modules do not modify the ingested data index and the dynamically generated topics upon which the relevance classifications are based are not determined prior to using the data processing machine to collect ingested data based upon analytic requirements of the intelligence metric modules such that the relevance classifications are separated in the pipelined process from analytic requirements of any given intelligence metric module.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments of the present invention disclose a method for Business Intelligence (BI) metrics on unstructured data. Unstructured data is collected from numerous data sources that include unstructured data as ingested data. The ingested data is indexed and represents hyperlink and extracted data and metadata for each document. Thereafter, the ingested data is automatically classified into one or more relevance classes. Further, numerous analytics are performed on the classified data to generate business intelligence metrics that may be presented on an access device operated by a user.
-
Citations
16 Claims
-
1. A machine-implemented method for a pipelined process of capture, classification and dimensioning of data from a plurality of data sources that include unstructured data having no explicit dimensions associated with the unstructured data to generate a domain-relevant classified data index that is useable by a plurality of different intelligence metrics to perform different kinds of business intelligence analytics, the method comprising:
-
using a data processing machine to collect ingested data as one or more documents from each of the plurality of data sources that include unstructured data and automatically generate and store an ingested data index representing the ingested data that includes at least a hyperlink and extracted meta data for each document; using a data processing machine to automatically classify each of the one or more documents into one or more relevance classifications that are stored with the ingested data index for that document to form a domain-relevant classified data index representing the ingested data, wherein the relevance classifications are based on a plurality of dynamically generated topics that are generated in response to machine analysis that includes machine-defined classifiers and in response to user input that includes user-defined named-entities and user-defined keywords; and using a data processing machine to automatically process the plurality of data sources with a plurality of different intelligence metric modules independent of and after the one or more documents have been initially ingested and classified by utilizing the domain-relevant classified data index to generate analytics results that are presented for a user, including processing at least one of the documents in the ingested data with each intelligence metric module based upon a plurality of dimensions abstracted from the relevance classifications and the extracted metadata that includes at least one implicit dimension derived from one or more of the user-defined named-entities, wherein the intelligence metric modules do not modify the ingested data index and the dynamically generated topics upon which the relevance classifications are based are not determined prior to using the data processing machine to collect ingested data based upon analytic requirements of the intelligence metric modules such that the relevance classifications are separated in the pipelined process from analytic requirements of any given intelligence metric module. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
Specification