×

System and method of making unstructured data available to structured data analysis tools

  • US 7,849,048 B2
  • Filed: 07/05/2005
  • Issued: 12/07/2010
  • Est. Priority Date: 07/05/2005
  • Status: Active Grant
First Claim
Patent Images

1. A system for making unstructured data available to structured data tools comprising:

  • a core server computer, wherein the core server computer performs steps comprising;

    accessing a source of unstructured data;

    reading the unstructured data from the source of unstructured data;

    sending the unstructured data to one or more transformation tools;

    parsing, via a natural-language processing transformation tool, the unstructured data to extract sentences from the unstructured data and then further extract from the extracted sentences sentence-level natural-language processed entities, wherein the sentence-level natural-language processed entities are at least noun phrases;

    extracting, via a linguistic processing transformation tool, sentence-level linguistically-processed relationships, wherein the sentence-level linguistically-processed relationships comprise associations between the sentence-level natural-language processed entities;

    sending the sentence-level natural-language processed entities and the sentence-level linguistically-processed relationships from the one or more transformation tools to a categorization tool;

    determining, via the categorization tool, categorization data elements present in each extracted sentence, wherein the categorization data elements are based on the sentence-level natural-language processed entities and the sentence-level linguistically-processed relationships, and are placed within predetermined categories, and a confidence level for each categorization data element, wherein the confidence level for each categorization data element combines one or more data points linked to the sentence-level natural-language processed entities and the sentence-level linguistically-processed relationships to create a statistically-oriented calculation of confidence assigned to the categorization data element;

    outputting the confidence level for at least one of the categorization data elements for use in structured data tools; and

    wherein the one or more data points are selected from the group consisting of;

    confidence score of value provided by the one or more transformation tools, number of relationships found in the source of unstructured data compared to the size of the source of unstructured data, average number of relationships per kilobyte for relationships of the same type as a selected relationship, number of entities found to be associated with a relationship compared to an average number of entities for relationships in a same hierarchy, number of times similar relationships have been found in the past, number of entities that are grouped together to form a master entity, a number of times an entity occurred in the source of unstructured data compared to the average number of occurrences for entities in the same hierarchy, weighted confidences based on hierarchy of a relationship or entity, measures of data extraction confidence integrated with the system via an analysis schema, measures based on a fullness of a relationship'"'"'s attributes, measures based on the confluence of a same finding by multiple transformation tools, measures based on the source of the unstructured data, and combinations thereof.

View all claims
  • 12 Assignments
Timeline View
Assignment View
    ×
    ×