Method and system for high performance integration, processing and searching of structured and unstructured data using coprocessors
First Claim
1. A method of providing integrated access to both structured and unstructured data, the method comprising:
- maintaining metadata in a database of structured data, the metadata comprising data about unstructured data;
receiving a query directed toward a combination of structured and unstructured data, the query having a portion directed toward unstructured data, wherein the portion comprises at least one term for searching within unstructured data;
retrieving structured data from a database of structured data in accordance with the received query, the retrieved structured data comprising at least a portion of the metadata;
processing the retrieved structured data to identify a scope of unstructured data for retrieval from a data store of unstructured data;
retrieving unstructured data from the data store of unstructured data corresponding to the identified scope of unstructured data;
streaming the retrieved unstructured data through a processing device in a computing system other than a main processor for the system; and
searching the retrieved streaming unstructured data based on the at least one query term using the processing device to find unstructured data within the retrieved streaming unstructured data that is responsive to the query.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is a method and system for integrating an enterprise'"'"'s structured and unstructured data to provide users and enterprise applications with efficient and intelligent access to that data. Queries can be directed toward both an enterprise'"'"'s structured and unstructured data using standardized database query formats such as SQL commands. A coprocessor can be used to hardware-accelerate data processing tasks (such as full-text searching) on unstructured data as necessary to handle a query. Furthermore, traditional relational database techniques can be used to access structured data stored by a relational database to determine which portions of the enterprise'"'"'s unstructured data should be delivered to the coprocessor for hardware-accelerated data processing.
-
Citations
60 Claims
-
1. A method of providing integrated access to both structured and unstructured data, the method comprising:
-
maintaining metadata in a database of structured data, the metadata comprising data about unstructured data; receiving a query directed toward a combination of structured and unstructured data, the query having a portion directed toward unstructured data, wherein the portion comprises at least one term for searching within unstructured data; retrieving structured data from a database of structured data in accordance with the received query, the retrieved structured data comprising at least a portion of the metadata; processing the retrieved structured data to identify a scope of unstructured data for retrieval from a data store of unstructured data; retrieving unstructured data from the data store of unstructured data corresponding to the identified scope of unstructured data; streaming the retrieved unstructured data through a processing device in a computing system other than a main processor for the system; and searching the retrieved streaming unstructured data based on the at least one query term using the processing device to find unstructured data within the retrieved streaming unstructured data that is responsive to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 50, 53, 54)
-
-
16. A system for processing data, the system comprising:
-
a main processor; a processing device other than the main processor; a data store of unstructured data in communication with the main processor and the processing device; a data store of structured data in communication with the main processor and the processing device, the structured data comprising metadata about at least a portion of the unstructured data; wherein the main processor is configured to (1) receive a query, the query comprising at least one term for searching within unstructured data, (2) process at least a portion of the query against the structured data in the data store of structured data to identify a subset of unstructured data in the data store of unstructured data, and (3) request that the subset of unstructured data be delivered to the processing device; and wherein the processing device is configured to (1) receive the subset of unstructured data and (2) search the received subset of unstructured data based on the at least one query term to determine whether any of the unstructured data within the received subset of unstructured data matches the at least one query term. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 51, 52, 55)
-
-
31. A method of providing integrated access to both structured and unstructured data, the method comprising:
-
maintaining metadata in a database of structured data, the metadata comprising data about unstructured data; receiving a query directed toward a combination of structured and unstructured data, the query having a portion directed toward unstructured data wherein the portion comprises at least one term for processing against the unstructured data; retrieving structured data from a database of structured data in accordance with the received query, the structured data comprising at least a portion of the metadata; identifying which unstructured data in a data store of unstructured data is to be retrieved based on the retrieved structured data; retrieving the identified unstructured data from the data store of unstructured data; streaming the retrieved unstructured data through firmware deployed on a reconfigurable logic device; and performing a query-specified data processing operation on the streaming retrieved unstructured data using the firmware to find a set of unstructured data within the streaming retrieved unstructured data that is relevant to the query in view of the at least one query term. - View Dependent Claims (56, 57)
-
-
32. A system for processing data, the system comprising:
-
a reconfigurable logic device having firmware deployed thereon; a data store of unstructured data in communication with the reconfigurable logic device; a data store of structured data in communication with the reconfigurable logic device, the structured data comprising metadata about at least a portion of the unstructured data; a processor configured to execute an interface, the interface configured to (1) receive a query, the query comprising at least one term for processing against unstructured data, (2) process at least a portion of the query against the structured data in the data store of structured data to identify a subset of unstructured data in the data store of unstructured data, and (3) request that the subset of unstructured data be delivered to the firmware on the reconfigurable logic device; and wherein the firmware is configured to perform a query-specified data processing operation on the subset of unstructured data using the at least one query term. - View Dependent Claims (58, 59)
-
-
33. A processor-readable storage medium for interfacing a coprocessor with a software application, the processor-readable storage medium comprising:
processor-executable code for (1) receiving a query from a software application, (2) determining a portion of the query to be directed against structured data stored by a data store of structured data, (3) determining portion of the query to be directed toward unstructured data stored by a data store of unstructured data, the determined query portion for unstructured data comprising at least one term for searching within unstructured data, (4) applying the determined query portion for structured data against the data store of structured data, the structured data comprising metadata about at least a portion of the unstructured data, (5) in response to the application of the determined query portion for structured data, receiving metadata from the data store of structured data that is representative of a subset of the stored unstructured data, (6) delivering a command to the coprocessor to thereby configure the coprocessor to perform a data analysis operation specified by the determined query portion for unstructured data, the data analysis operation comprising a searching operation based on the at least one query term, and (7) directing a delivery of the subset of unstructured data from the data store of unstructured data to the coprocessor for performance of the data analysis operation on the subset of unstructured data, wherein the code is resident on the processor-readable storage medium. - View Dependent Claims (34, 35)
-
36. An enterprise computing system, the system comprising:
-
a computer; an appliance in communication with the computer over a network, the appliance comprising a processor, a coprocessor, and a data store; wherein the computer is configured to execute a software application to deliver a query to the appliance; wherein the appliance is configured to receive the query; wherein the processor is configured to selectively apply a portion of the received query to structured data stored by the data store and identify a subset of unstructured data in response to the applied query, wherein the structured data comprises metadata about at least a portion of the unstructured data; wherein the processor is further configured to define a searching operation for the coprocessor based on another portion of the received query, the another portion including at least one query term for searching unstructured data; wherein the appliance is further configured to stream the identified subset of unstructured data through the coprocessor; wherein the coprocessor is configured to perform the defined searching operation on the identified subset of unstructured data streamed therethrough based on the at least one query term to thereby generate a result set corresponding to the received query; and wherein the processor is further configured to (1) formulate a response to the query based on the generated result set and (2) communicate the query response to the computer. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 60)
-
Specification