Filtering pipeline optimizations for unstructured data
First Claim
Patent Images
1. A system, comprising:
- one or more computing devices of an object storage service;
wherein the one or more computing devices include instructions that upon execution on a processor cause the one or more computing devices to;
store, in response to one or more programmatic requests, a plurality of unstructured data items of an item collection, including a first unstructured data item;
determine a first query comprising one or more predicates to be used to filter data for inclusion in a response to a first access request directed to the item collection;
based at least in part on an examination of the one or more predicates, identify a first character-level test that can be used to determine, without completing parsing of a particular record identified within the first unstructured data item, whether the particular record satisfies the first query;
abandon parsing of a first record of the first unstructured data item in response to determining, using the first character-level test, that the first record does not satisfy the first query;
determine, using at least a subset of records identified in the first unstructured data item, a response to the first access request, wherein the subset does not include the first record.
1 Assignment
0 Petitions
Accused Products
Abstract
Unstructured data items are stored at an object storage service. A filtering criterion to be used to generate a result set for an access request is determined. A test that can be used to determine, without completing parsing of a record identified in an unstructured data item, whether the record satisfies the filtering criterion is identified. Parsing of a particular record is abandoned in response to determining, using the test, that the record satisfies the filtering criterion. A response to the access request is determined using a subset of records that satisfy the filtering criterion.
18 Citations
20 Claims
-
1. A system, comprising:
-
one or more computing devices of an object storage service; wherein the one or more computing devices include instructions that upon execution on a processor cause the one or more computing devices to; store, in response to one or more programmatic requests, a plurality of unstructured data items of an item collection, including a first unstructured data item; determine a first query comprising one or more predicates to be used to filter data for inclusion in a response to a first access request directed to the item collection; based at least in part on an examination of the one or more predicates, identify a first character-level test that can be used to determine, without completing parsing of a particular record identified within the first unstructured data item, whether the particular record satisfies the first query; abandon parsing of a first record of the first unstructured data item in response to determining, using the first character-level test, that the first record does not satisfy the first query; determine, using at least a subset of records identified in the first unstructured data item, a response to the first access request, wherein the subset does not include the first record. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
performing, by one or more computing devices of an object storage service; storing a plurality of unstructured data items of an item collection, including a first unstructured data item; determining a first filtering criterion for inclusion of data of the collection in a result set to be used to generate a response to a first access request; identifying a first test that can be used to determine, without completing parsing of a particular record identified within the first unstructured data item of the item collection, whether the particular record satisfies the first filtering criterion; abandoning parsing of a first record identified within the first unstructured data item in response to determining, using the first test, that the first record does not satisfy the first filtering criterion; and determining, using a subset of records identified within the first unstructured data item, a response to the first access request, wherein the subset does not include the first record. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors cause the one or more processors to:
-
store a plurality of unstructured data items of an item collection, including a first unstructured data item; determine a first filtering criterion for inclusion of data of the collection in a result set to be used to generate a response to a first access request; identify a first test that can be used to determine, without completing parsing of a particular record identified within the first unstructured data item of the item collection, whether the particular record satisfies the first filtering criterion; abandon parsing of a first record identified within the first unstructured data item in response to determining, using the first test, that the first record satisfies the first filtering criterion; and determine, using a subset of records identified within the first unstructured data item, a response to the first access request, wherein the subset includes the first record. - View Dependent Claims (17, 18, 19, 20)
-
Specification