×

Text based schema discovery and information extraction

  • US 7,930,322 B2
  • Filed: 05/27/2008
  • Issued: 04/19/2011
  • Est. Priority Date: 05/27/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for creating a database schema from unstructured documents comprising the steps of:

  • accessing an unstructured document;

    extracting information from the unstructured document using text mining, the extracted information comprising terms, phrases and sentiments;

    analyzing the extracted information to identify sections of the unstructured document;

    storing statistics regarding an occurrence of items in the unstructured document, the items comprising the extracted information and identified sections;

    repeating the accessing, extracting, analyzing, and storing steps for a plurality of unstructured documents;

    creating a probabilistic model based on the statistics stored for the plurality of unstructured documents;

    generating a database schema using the probabilistic model;

    receiving user modifications to the probabilistic model;

    updating the probabilistic model based upon the user modifications; and

    generating a database based on the database schema generated using the probabilistic model.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×