System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents
DCFirst Claim
Patent Images
1. A method of time stamp extraction and verification, comprising:
- parsing a page and representing it as a sequence of paragraphs; and
building list of candidates for time stamp for each paragraph by extraction of valid triads representing year, month and day.
4 Assignments
Litigations
0 Petitions
Accused Products
Abstract
Provided are methods and systems that extract facts of unstructured documents and build an oracle for various domains. The present invention addresses the problem of efficient finding and extraction of facts about a particular subject domain from semi-structured and unstructured documents, makes inferences of new facts from the extracted facts and the ways of verification of the facts, thus becoming a source of knowledge about the domain to be effectively queried. The methods and systems can also extract temporal information from unstructured and semi-structured documents, and can find and extract dynamically generated documents from Deep or Dynamic Web.
-
Citations
10 Claims
-
1. A method of time stamp extraction and verification, comprising:
-
parsing a page and representing it as a sequence of paragraphs; and building list of candidates for time stamp for each paragraph by extraction of valid triads representing year, month and day. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification