System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents
First Claim
Patent Images
1. A method for objects identification and inference comprising:
- utilizing a three level object presentation consisting of instance, denotatum and denotatum class;
applying application dependent inference rules to determine a match between instances and objects.
4 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods and systems that extract facts of unstructured documents and build an oracle for various domains. The present invention addresses the problem of efficient finding and extraction of facts about a particular subject domain from semi-structured and unstructured documents, makes inferences of new facts from the extracted facts and the ways of verification of the facts, thus becoming a source of knowledge about the domain to be effectively queried. The methods and systems can also extract temporal information from unstructured and semi-structured documents, and can find and extract dynamically generated documents from Deep or Dynamic Web.
23 Citations
21 Claims
-
1. A method for objects identification and inference comprising:
-
utilizing a three level object presentation consisting of instance, denotatum and denotatum class; applying application dependent inference rules to determine a match between instances and objects. - View Dependent Claims (2, 3, 4)
-
-
5. A method for incorrect object identification recovery comprising:
-
utilizing a contradiction between new and old facts or human request as a trigger for reclassification of affected facts; and utilizing the roll forward transactions to eliminate and rearrange denotatum classes. - View Dependent Claims (6, 7, 8, 13, 14, 15, 16)
-
-
9. A method to convert unstructured and semi-structured information into a structured format, comprising:
-
crawling the Internet and Intranets to generate a set of pages for further analysis; applying different knowledge agents in different order to each page to extract application dependent candidate facts; building new candidate facts from the extracted facts using logical inference; verifying correctness of the candidates facts using recursive verification, recursive bootstrapping and deferred decision methods; and storing the verified facts in structured form in data repository; wherein a method of building business information network database is provided, comprising; collecting documents from internet and other sources; applying surface and deep web crawling to collect these documents; extracting business information facts from each document; filtering out incorrect or irrelevant facts; applying consistency checks, directly and recursively, to solidify correctness of facts; storing facts in business information network database; providing access to the information in business information network database for different on-line users. - View Dependent Claims (10, 11, 12, 17, 18, 19, 20, 21)
-
Specification