System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents
First Claim
Patent Images
1. A method for grammatically parsing, comprising:
- building application specific context grammar rules;
building application specific local grammar rules; and
building a many-to-many mapping between context and local rules.
5 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods and systems that extract facts of unstructured documents and build an oracle for various domains. The present invention addresses the problem of efficient finding and extraction of facts about a particular subject domain from semi-structured and unstructured documents, makes inferences of new facts from the extracted facts and the ways of verification of the facts, thus becoming a source of knowledge about the domain to be effectively queried. The methods and systems can also extract temporal information from unstructured and semi-structured documents, and can find and extract dynamically generated documents from Deep or Dynamic Web.
27 Citations
20 Claims
-
1. A method for grammatically parsing, comprising:
-
building application specific context grammar rules; building application specific local grammar rules; and building a many-to-many mapping between context and local rules. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for object, relationship and attribute verification, comprising:
-
dividing local grammar rules into strict and loose categories in response to their precision vs. recall ratio; applying strict rules to all relevant page paragraphs and building a set of strict mappings; and applying loose rules to all relevant page paragraphs and building a set of loose mappings. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method for automatic expansion of local grammar rules comprising:
-
utilizing a pre-defined set of rules as a baseline; applying iterative bootstrapping process using a combination of rule separators expansion and reduction depending on the results of their application to an expanding set of documents. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification