Information extraction from a database
First Claim
Patent Images
1. A method comprising:
- searching a plurality of documents to identify a plurality of occurrences of a first tuple in text of the plurality of documents and a respective context for each occurrence of the first tuple in the text of the plurality of documents;
analyzing the identified plurality of occurrences and the respective context for each occurrence to identify a first data pattern that corresponds to the first tuple;
extracting a second tuple from the text of the plurality of documents, using the first data pattern;
storing the first tuple and the second tuple in a data storage;
identifying a second data pattern that corresponds to the second tuple;
identifying, in the plurality of documents, an occurrence of a third tuple, that matches the identified second data pattern and that is different from the first and second tuples; and
storing the third tuple in the data storage.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for extracting information from a database are provided. A database such as the Web is searched for occurrences of tuples of information. The occurrences of the tuples of information that were found in the database are analyzed to identify a pattern in which the tuples of information were stored. Additional tuples of information can then be extracted from the database utilizing the pattern. This process can be repeated with the additional tuples of information, if desired.
18 Citations
22 Claims
-
1. A method comprising:
-
searching a plurality of documents to identify a plurality of occurrences of a first tuple in text of the plurality of documents and a respective context for each occurrence of the first tuple in the text of the plurality of documents; analyzing the identified plurality of occurrences and the respective context for each occurrence to identify a first data pattern that corresponds to the first tuple; extracting a second tuple from the text of the plurality of documents, using the first data pattern; storing the first tuple and the second tuple in a data storage; identifying a second data pattern that corresponds to the second tuple; identifying, in the plurality of documents, an occurrence of a third tuple, that matches the identified second data pattern and that is different from the first and second tuples; and storing the third tuple in the data storage. - View Dependent Claims (4, 5, 6, 7, 8, 9, 20)
-
-
2. A system, comprising:
-
a data storage; and one or more processors to; search a plurality of documents to identify a plurality of occurrences of a first tuple in text of the plurality of documents and a respective context for each occurrence of the first tuple in the text of the plurality of documents; identify a first data pattern that corresponds to the first tuple based on the identified plurality of occurrences and the respective context for each occurrence; locate a second tuple, from the text of the plurality of documents, using the first data pattern, where the second tuple is different from the first tuple; store the first tuple and the second tuple in the data storage; identify a second data pattern that corresponds to the second tuple; locate a third tuple, from the text of the plurality of documents, using the identified second data pattern, where the third tuple is different from the first and second tuples; and store the third tuple in the data storage. - View Dependent Claims (10, 11, 12, 13, 21)
-
-
3. A computer-readable storage medium storing computer-executable instructions, the computer-executable instructions comprising:
-
one or more instructions to identify a plurality of occurrences of a first tuple in text of a plurality of documents and a respective context for each occurrence of the first tuple in the text of the plurality of documents; one or more instructions to identify a first data pattern that corresponds to the first tuple based on the identified plurality of occurrences and the respective context for each occurrence; one or more instructions to extract a second tuple from the text of the plurality of documents, using the first data pattern; one or more instructions to store the first tuple and the second tuple in a data storage; one or more instructions to identify a second data pattern that corresponds to the second tuple; one or more instructions to extract a third tuple from the text of the documents, using the second data pattern, where the third tuple is different from the first and second tuples; and one or more instructions to store the third tuple in the data storage. - View Dependent Claims (14, 15, 16, 17, 18, 19, 22)
-
Specification