System for assembling large databases through information extracted from text sources
First Claim
1. A method for combining types of items of information from a plurality of text-based information sources in a plurality of formats, into items in a database, withat least one information source including unstructured written text and structured text, andat least one item of information including one or more attributes, with each attribute having at least one value,the method comprising the steps of:
- for each information source,extracting and organizing items of information from the structured and unstructured written text of the information source, to generate an index plan, with each item in the plan organized as a hierarchic data structure representing the item'"'"'s attributes, their values and the locations of the text in the information source supporting those values; and
researching and consolidating the items of information in the index plan into items in the database to avoid duplication of items in the database.
10 Assignments
0 Petitions
Accused Products
Abstract
Traditional information extraction processes are usually implemented on a programmed general purpose computer. The process looks for certain information, and organizes the information into a database record. The database created is usually stored in a searchable format such as a structured relational database or an object-orientated structured database, which can be accessed, research, and analyzed by computer-implemented database research systems. However, generic information extraction processes only input the extracted information into the database, in the last step of the process and do not address the problem of compiling large and comprehensive database from a plurality of source documents. Furthermore, information extraction processes are not focused on how the information extracted will be used in the construction of a large database. It would be desirable to have an information extraction system with the ability to assemble extracted information and to recognize any conflicts between the extracted information and the contents of an existing database. Accordingly, the invention is an information indexing process with the above features having the ability to construct a database with a high degree of integrity from a plurality of source documents.
398 Citations
16 Claims
-
1. A method for combining types of items of information from a plurality of text-based information sources in a plurality of formats, into items in a database, with
at least one information source including unstructured written text and structured text, and at least one item of information including one or more attributes, with each attribute having at least one value, the method comprising the steps of: for each information source, extracting and organizing items of information from the structured and unstructured written text of the information source, to generate an index plan, with each item in the plan organized as a hierarchic data structure representing the item'"'"'s attributes, their values and the locations of the text in the information source supporting those values; and researching and consolidating the items of information in the index plan into items in the database to avoid duplication of items in the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A system for combining types of items of information from a plurality of text-based information sources in a plurality of formats, into items in a database, with
at least one information source including unstructured written text and structured text, and at least one item of information including one or more attributes, with each attribute having at least one value, the system comprising: -
an extractor configured to extract and organize items of information from the structured and unstructured written text of each information source, to generate an index plan, with each item in the plan organized as a hierarchic data structure representing the item'"'"'s attributes, their values and the locations of the text in the information source supporting those values; and a research mechanism configured to research and consolidate the items of information in the index plan into items in the database to avoid duplication of items in the database. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
Specification