Method and system for business intelligence analytics on unstructured data

US 20100114899A1
Filed: 10/07/2009
Published: 05/06/2010
Est. Priority Date: 10/07/2008
Status: Active Grant

First Claim

Patent Images

1. A machine-implemented method for a pipelined process of capture, classification and dimensioning of data from a plurality of data sources that include unstructured data having no explicit dimensions associated with the unstructured data to generate a domain-relevant classified data index that is useable by a plurality of different intelligence metrics to perform different kinds of business intelligence analytics, the method comprising:

using a data processing machine to collect ingested data as one or more documents from each of the plurality of data sources that include unstructured data and automatically generate and store an ingested data index representing the ingested data that includes at least a hyperlink and extracted meta data for each document;

using a data processing machine to automatically classify each of the one or more documents into one or more relevance classifications that are stored with the ingested data index for that document to form a domain-relevant classified data index representing the ingested data, wherein the relevance classifications are based on a plurality of dynamically generated topics that are generated in response to machine analysis that includes machine-defined classifiers and in response to user input that includes user-defined named-entities and user-defined keywords; and

using a data processing machine to automatically process the plurality of data sources with a plurality of different intelligence metric modules independent of and after the one or more documents have been initially ingested and classified by utilizing the domain-relevant classified data index to generate analytics results that are presented for a user, including processing at least one of the documents in the ingested data with each intelligence metric module based upon a plurality of dimensions abstracted from the relevance classifications and the extracted metadata that includes at least one implicit dimension derived from one or more of the user-defined named-entities,wherein the intelligence metric modules do not modify the ingested data index and the dynamically generated topics upon which the relevance classifications are based are not determined prior to using the data processing machine to collect ingested data based upon analytic requirements of the intelligence metric modules such that the relevance classifications are separated in the pipelined process from analytic requirements of any given intelligence metric module.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments of the present invention disclose a method for Business Intelligence (BI) metrics on unstructured data. Unstructured data is collected from numerous data sources that include unstructured data as ingested data. The ingested data is indexed and represents hyperlink and extracted data and metadata for each document. Thereafter, the ingested data is automatically classified into one or more relevance classes. Further, numerous analytics are performed on the classified data to generate business intelligence metrics that may be presented on an access device operated by a user.

Citations

16 Claims

1. A machine-implemented method for a pipelined process of capture, classification and dimensioning of data from a plurality of data sources that include unstructured data having no explicit dimensions associated with the unstructured data to generate a domain-relevant classified data index that is useable by a plurality of different intelligence metrics to perform different kinds of business intelligence analytics, the method comprising:
- using a data processing machine to collect ingested data as one or more documents from each of the plurality of data sources that include unstructured data and automatically generate and store an ingested data index representing the ingested data that includes at least a hyperlink and extracted meta data for each document;
  
  using a data processing machine to automatically classify each of the one or more documents into one or more relevance classifications that are stored with the ingested data index for that document to form a domain-relevant classified data index representing the ingested data, wherein the relevance classifications are based on a plurality of dynamically generated topics that are generated in response to machine analysis that includes machine-defined classifiers and in response to user input that includes user-defined named-entities and user-defined keywords; and
  
  using a data processing machine to automatically process the plurality of data sources with a plurality of different intelligence metric modules independent of and after the one or more documents have been initially ingested and classified by utilizing the domain-relevant classified data index to generate analytics results that are presented for a user, including processing at least one of the documents in the ingested data with each intelligence metric module based upon a plurality of dimensions abstracted from the relevance classifications and the extracted metadata that includes at least one implicit dimension derived from one or more of the user-defined named-entities,wherein the intelligence metric modules do not modify the ingested data index and the dynamically generated topics upon which the relevance classifications are based are not determined prior to using the data processing machine to collect ingested data based upon analytic requirements of the intelligence metric modules such that the relevance classifications are separated in the pipelined process from analytic requirements of any given intelligence metric module.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The machine-implemented method of claim 1 further comprising:
    - obtaining user-feedback from the user in response to the analytic results that are presented for the user; and
      
      causing a data processing machine to adaptively utilize the user-feedback to modify the relevance classifications.
  - 3. The machine-implemented method of claim 1 wherein the plurality of data sources include text, images, video and audio and wherein using a data processing machine to collect ingested data includes:
    - using data source connectors to access the plurality of data sources, wherein the data source connectors include one or more of internal file system connectors, web site connectors, blog connectors, subscription connectors, email connectors, short-message-service connectors.
  - 4. The machine-implemented method of claim 1 wherein the plurality of data sources include text, images, video and audio and wherein using a data processing machine to collect ingested data includes:
    - using multi-modal scanning to identify and access the plurality of data sources.
  - 5. The machine-implemented method of claim 1 using a data processing machine to collect ingested data further comprises:
    - using automated information extraction techniques to generate at least some of the extracted meta data for each document, wherein different automated information extraction techniques are used for different types of documents.
  - 6. The machine-implemented method of claim 5 wherein the different automated information extraction techniques used for different types of documents include:
    - video information extraction for video document types based on events, objects, activities or motion, image information extraction for image document types based on events, objects or activities, audio information extraction for audio type documents based on text translation or phonetics, text information extraction based on natural language processing or [SRL], or any combination thereof for documents of single or multiple types.
  - 7. The machine-implemented method of claim 1 wherein using a data processing machine to automatically classify each of the one or more documents into one or more relevance classifications further comprises:
    - using multi-modal indexing for at least some of the documents to generate index information that is stored with the ingested data index for that document to form the domain-relevant classified data index representing the ingested data.
  - 8. The machine-implemented method of claim 1 wherein using a data processing machine to automatically process the ingested data with the plurality of different intelligence metric modules includes:
    - using different data processing machines to perform different ones of the plurality of different intelligence metric modules.
  - 9. The machine-implemented method of claim 1 wherein using a data processing machine to automatically process the ingested data with the plurality of different intelligence metric modules includes:
    - reprocessing the one or more documents with at least one of the intelligence metric modules.
  - 10. The machine-implemented method of claim 1 wherein using a data processing machine to automatically process the ingested data with the plurality of different intelligence metric modules further comprises:
    - generating key performance indicator analytic data associated with the domain-relevant classified data index that differs for different types of documents including;
      
      video analytics for video document types based on events, objects, activities or motion, image analytics for image document types based on events, objects or activities, audio analytics for audio type documents based on text translation, phonetics or emotion extraction, text information extraction based on natural language processing, statistical processing, or event detection or, or any combination thereof for documents of single or multiple types.
  - 11. The machine-implemented method of claim 10 wherein the plurality of different intelligence metric modules process the ingested data including the key performance indicator analytic data to generate a series of key performance indicator tables stored in a structured query language database.
  - 12. The machine-implemented method of claim 1 wherein the plurality of different intelligence metric modules include both intelligence metric modules that are developed by one or more non-users and customized intelligence metric modules that are defined by the user.
  - 13. The machine-implemented method of claim 2 wherein using a data processing machine to automatically process the ingested data with the plurality of different intelligence metric modules to generate analytics results that are presented for a user includes:
    - providing a query user interface accessible using the data processing machine; and
      
      providing a display user interface accessible using the data processing machine.
  - 14. The machine-implemented method of claim 13 wherein providing the query user interface accessible using the data processing machine includes:
    - providing a structured query user interface; and
      
      providing an ad hoc query user interface.
  - 15. The machine-implemented method of claim 13 wherein the obtaining user-feedback from the user in response to the analytic results that are presented for the user query user interface is accomplished using the query user interface.
  - 16. The machine-implemented method of claim 13 wherein providing the display user interface accessible using the data processing machine includes a non-text display, a report display, an alerts display, a dashboard display, or any combination thereof.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Aumni Data Incorporated
Original Assignee
Aumni Data Incorporated
Inventors
Madireddi, Venky, Wu, Shumin, Guha, Aloke, Wrabetz, Joan

Granted Patent

US 8,266,148 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/741
CPC Class Codes

G06F 16/9535 Search customisation based ...

Method and system for business intelligence analytics on unstructured data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for business intelligence analytics on unstructured data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links