Adaptive gathering of structured and unstructured data system and method
First Claim
1. A computer implemented method of managing a prioritized Uniform Resource Identifier (“
- URI”
) queue comprising;
by a first computer processor, utilizing a first URI to access a first content in a first communication session with a first webserver associated with a first merchant at a first URI access time and utilizing the first URI to access a second content in a second communication session with the first webserver at a second URI access time subsequent to the first URI access time;
by the first or a second computer processor, parsing the first content for first price and product attribute values, saving the result as a first parse result associated with a first product in a first computer memory, and associating the first parse result with a first URI-specific product identifier;
by the first or the second computer processor, parsing the second content for second price and product attribute values, saving the result as a second parse result associated with the first product in the first computer memory, and associating the second parse result with a first merchant-specific identifier;
by the first or the second computer processor, determining at least one difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result; and
setting a time to next check of the first URI in the prioritized URI queue at least according to the determined difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result.
3 Assignments
0 Petitions
Accused Products
Abstract
Content is obtained from a webpage accessed via a URI, which URI is obtained from a URI queue. The content is parsed for price and product information according to a parse map, with the resulting parse result being stored. The priority of URIs in the URI queue is adjusted based on analysis of the parse result for changes in price and product attributes and according to other criteria. The parse map may be one associated with the URI or a general purpose parse maps. The parse result may be validated by human- and machine-based systems, including by graphically labeling price and product information in the content for human confirmation or correction.
6 Citations
20 Claims
-
1. A computer implemented method of managing a prioritized Uniform Resource Identifier (“
- URI”
) queue comprising;by a first computer processor, utilizing a first URI to access a first content in a first communication session with a first webserver associated with a first merchant at a first URI access time and utilizing the first URI to access a second content in a second communication session with the first webserver at a second URI access time subsequent to the first URI access time; by the first or a second computer processor, parsing the first content for first price and product attribute values, saving the result as a first parse result associated with a first product in a first computer memory, and associating the first parse result with a first URI-specific product identifier; by the first or the second computer processor, parsing the second content for second price and product attribute values, saving the result as a second parse result associated with the first product in the first computer memory, and associating the second parse result with a first merchant-specific identifier; by the first or the second computer processor, determining at least one difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result; and setting a time to next check of the first URI in the prioritized URI queue at least according to the determined difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result. - View Dependent Claims (7, 8, 9, 10, 14, 15, 16, 18, 19, 20)
- URI”
-
2. A computer apparatus for managing a prioritized Uniform Resource Identifier (“
- URI”
) queue comprising;a first computer processor and memory comprising a crawl agent software routine and a second computer processor and memory comprising a URI queue manager software routine; wherein the crawl agent software routine is to utilize a first URI to access a first content in a first communication session with a first webserver associated with a first merchant at a first URI access time and utilize the first URI to access a second content in a second communication session with the first webserver at a second URI access time subsequent to the first URI access time; wherein the URI queue manager software routine is to; parse the first content for first price and product attribute values, save as a first parse result associated with a first product in a first computer memory, and associate the first parse result with a first URI-specific product identifier; parse the second content for second price and product attribute values, save as a second parse result associated with the first product in the first computer memory, and associate the second parse result with a first merchant-specific identifier; determine at least one difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result; and set a time to next check of the first URI in the prioritized URI queue according to the determined difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result.
- URI”
-
3. One or more computer-readable media comprising instructions that cause a computer device, in response to execution of the instructions by one or more processors of the computer device to perform the following:
-
utilize a first URI to access a first content in a first communication session with a first webserver associated with a first merchant at a first URI access time and utilize the first URI to access a second content in a second communication session with the first webserver at a second URI access time subsequent to the first URI access time; parse the first content for first price and product attribute values, save as a first parse result associated with a first product in a first computer memory, and associate the first parse result with a first URI-specific product identifier; parse the second content for second price and product attribute values, save as a second parse result associated with the first product in the first computer memory, and associate the second parse result with a first merchant-specific identifier; determine at least one difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result; and set a time to next check of the first URI in the prioritized URI queue at least according to the determined difference between the first price and product attribute values in the first parse result and the second price and product attribute values in the second parse result.
-
-
4. A computer-implemented method for determining if products on different websites can be assigned same or different product identifiers, comprising:
-
by a first computer processor, receiving a first content from a first webpage from a first webserver associated with a first merchant, by the first computer processor, parsing the first content for a first set of product attribute values, saving as a first parse result, and associating the first parse result with a first URI-specific product identifier; by the first computer processor, receiving a second content from a second webpage from a second webserver associated with a second merchant; by the first computer processor, parsing the second content for a second set of product attribute values, saving as a second parse result, and associating the second parse result with a second URI-specific product identifier; for a set of parse results comprising the first and second parse result, identifying at least one product attribute cluster in the set of parse results, and assigning a URI-independent product identifier to the product attribute cluster, wherein the product attribute cluster comprises parse results comprising a first product attribute value within a cluster range. - View Dependent Claims (11, 12, 13, 17)
-
-
5. A computer apparatus for determining if products on different websites can be assigned same or different product identifiers, comprising:
a first computer processor and memory comprising an identifier assignment routine wherein the identifier assignment routine is to; receive a first content from a first webpage from a first webserver associated with a first merchant; parse the first content for a first set of product attribute values, save as a first parse result, and associate the first parse result with a first URI-specific product identifier; receive a second content from a second webpage from a second webserver associated with a second merchant; parse the second content for a second set of product attribute values, save as a second parse result, and associate the second parse result with a second URI-specific product identifier; for a set of parse results comprising the first and second parse result, identify at least one product attribute cluster in the set of parse results, and assign a URI-independent product identifier to the product attribute cluster, wherein the product attribute cluster comprises parse results comprising a first product attribute value within a cluster range.
-
6. One or more computer-readable media comprising instructions that cause a computer device, in response to execution of the instructions by one or more processors of the computer device to perform the following:
-
receive a first content from a first webpage from a first webserver associated with a first merchant; parse the first content for a first set of product attribute values, save as a first parse result, and associate the first parse result with a first URI-specific product identifier; receive a second content from a second webpage from a second webserver associated with a second merchant; parse the second content for a second set of product attribute values, save as a second parse result, and associate the second parse result with a second URI-specific product identifier; for a set of parse results comprising the first and second parse result, identify at least one product attribute cluster in the set of parse results, and assign a URI-independent product identifier to the product attribute cluster, wherein the product attribute cluster comprises parse results comprising a first product attribute value within a cluster range.
-
Specification