Aggregating data from difference sources
First Claim
1. A method to aggregate data from different sources, the method comprising:
- receiving a collection of criteria that represent an entity;
crawling a corpus of documents having data related to the entity;
storing each document or a reference to each document in an index;
extracting information from each document based on the collection of criteria;
generating a hash identifier from the collection of criteria;
creating a storage location for the hash identifier;
storing extracted information in the corresponding storage locations based on the hash identifier; and
incorporating the extracted information into the index.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system that aggregates data associated with one or more entities from different data sources are provided. The data sources include documents, web pages, or images that have information about one or more entities. The information is extracted from the data sources based on criteria that define the entities. The extracted information is utilized to generate a hash identifier that corresponds to each entity and one or more storage locations. The one or more storage locations and associated hash identifiers are utilized to store the extracted information corresponding to the entities, and the extracted information for each entity is structured as a virtual page that is stored in an index having references to the data sources.
-
Citations
20 Claims
-
1. A method to aggregate data from different sources, the method comprising:
-
receiving a collection of criteria that represent an entity;
crawling a corpus of documents having data related to the entity;
storing each document or a reference to each document in an index;
extracting information from each document based on the collection of criteria;
generating a hash identifier from the collection of criteria;
creating a storage location for the hash identifier;
storing extracted information in the corresponding storage locations based on the hash identifier; and
incorporating the extracted information into the index. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method to generate virtual pages, the method comprising:
-
receiving a set of criteria;
parsing a corpus of documents based on the set of criteria;
generating a hash identifier based on the set of criteria;
associating the hash identifier with a storage location;
aggregating parsed information based on the hash identifier; and
generating virtual pages for each hash identifier. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. A system to generate virtual pages, the system comprising:
-
a set of crawlers to gather information about a collection of documents that provides information about one or more entities, an extracting component to extract data from the collection of documents, wherein predefined criteria associated with the one or more entities specify what information is extracted from the documents;
a hashing component to generate hash identifiers and associate the hash identifiers with storage locations that store information about the one more entities;
a virtual page component to aggregate and structure the extracted information into virtual pages based on the hash identifiers; and
an index to provide access to the virtual pages. - View Dependent Claims (17, 18, 19, 20)
-
Specification