Optimized storage solution for real-time queries and data modeling

US 10,599,648 B2
Filed: 07/24/2015
Issued: 03/24/2020
Est. Priority Date: 09/26/2014
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving a set of data from a plurality of devices operating in a manufacturing environment;

separately writing a first portion of the set of data to both a relational database and a distributed storage cluster, the distributed storage cluster comprising a plurality of storage nodes in a distributed computing environment;

upon receiving a query to be processed from the set of data;

analyzing the query to determine an application from which the query was received;

selecting one of the relational database and the distributed storage cluster for processing the query, based on a mapping rule that defines a predefined relationship between a type of the application from which the query was received and the selected one of the relational database and the distributed storage cluster, wherein the mapping rule specifies that queries from applications related to real-time operations are to be processed by the relational database; and

submitting the query to the selected one of the relational database and the distributed storage cluster for execution;

purging the first portion of the set of data from the relational database upon the stored first portion of the set of data in the relational database reaching a first age; and

purging the first portion of the set of data from the distributed storage cluster upon the stored first portion of the set of data in the distributed storage cluster reaching a second age, wherein the first age and the second age are different, and wherein the first age is lower than the second age.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments presented herein provide techniques for managing data in manufacturing systems. One embodiment includes receiving a set of data from a plurality of devices operating in a manufacturing environment. A portion of the set of data is written by a data management application to both a relational database and a distributed storage cluster that includes a plurality of storage nodes in a distributed computing environment. Upon receiving a query to extract a subset of data from the set of data, the query is analyzed to determine attributes of the query. Based, in part on the analysis, one of the relational database and the distributed storage cluster is selected for processing the query.

9 Citations

18 Claims

1. A method, comprising:
- receiving a set of data from a plurality of devices operating in a manufacturing environment;
  
  separately writing a first portion of the set of data to both a relational database and a distributed storage cluster, the distributed storage cluster comprising a plurality of storage nodes in a distributed computing environment;
  
  upon receiving a query to be processed from the set of data;
  
  analyzing the query to determine an application from which the query was received;
  
  selecting one of the relational database and the distributed storage cluster for processing the query, based on a mapping rule that defines a predefined relationship between a type of the application from which the query was received and the selected one of the relational database and the distributed storage cluster, wherein the mapping rule specifies that queries from applications related to real-time operations are to be processed by the relational database; and
  
  submitting the query to the selected one of the relational database and the distributed storage cluster for execution;
  
  purging the first portion of the set of data from the relational database upon the stored first portion of the set of data in the relational database reaching a first age; and
  
  purging the first portion of the set of data from the distributed storage cluster upon the stored first portion of the set of data in the distributed storage cluster reaching a second age, wherein the first age and the second age are different, and wherein the first age is lower than the second age.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the distributed storage cluster comprises a Hadoop Distributed Filing System (HDFS).
  - 3. The method of claim 1, wherein a data management application directly writes the first portion of the set of data to both the relational database and the distributed storage cluster, and wherein the first portion of the set of data comprises data representative of real-time operation of the manufacturing environment and data obtained from analyzing the real-time data.
  - 4. The method of claim 1, further comprising:
    - selectively writing a second portion of the set of data to only the relational database, based on a predefined relationship between a data type corresponding to the second portion of the set of data and the relational database.
  - 5. The method of claim 4, wherein the second portion of the set of data comprises configuration data.
  - 6. The method of claim 1, wherein selecting one of the relational database and the distributed storage cluster for processing the query comprises:
    - upon determining that the query is for at least one of real-time analysis or control of the manufacturing environment, selecting the relational database for processing the query; and
      
      upon determining that the query is for at least one of reporting or simulation of the manufacturing environment, selecting the distributed storage cluster for processing the query.
  - 7. The method of claim 1, further comprising:
    - upon receiving the set of data, writing the set of data to a buffer; and
      
      upon determining that a threshold amount of data is stored within the buffer;
      
      selecting one of the relational database and the distributed storage cluster; and
      
      writing the data stored in the buffer to the selected one of the relational database and the distributed storage cluster.
  - 8. The method of claim 1, further comprising:
    - upon receiving the set of data;
      
      writing the data to a buffer; and
      
      initiating a timer configured to expire after a predefined period of time; and
      
      upon expiration of the timer;
      
      selecting one of the relational database and the distributed storage cluster; and
      
      writing the data stored in the buffer to the selected one of the relational database and the distributed storage cluster.
  - 9. The method of claim 1, wherein writing the first portion of the set of data to the distributed storage cluster comprises replicating the first portion of the set of data across the plurality of storage nodes in the distributed storage cluster.
  - 10. The method of claim 5, further comprising:
    - replicating the configuration data; and
      
      writing the replicated configuration data to the distributed storage cluster.

11. A non-transitory computer-readable medium containing computer program code that, when executed, performs an operation comprising:
- receiving a set of data from a plurality of devices operating in a manufacturing environment;
  
  separately writing a first portion of the set of data to both a relational database and a distributed storage cluster, the distributed storage cluster comprising a plurality of storage nodes in a distributed computing environment;
  
  upon receiving a query to be processed from the set of data;
  
  analyzing the query to determine an application from which the query was received;
  
  selecting one of the relational database and the distributed storage cluster for processing the query, based on a mapping rule that defines a predefined relationship between a type of the application from which the query was received and the selected one of the relational database and the distributed storage cluster, wherein the mapping rule specifies that queries from applications related to real-time operations are to be processed by the relational database; and
  
  submitting the query to the selected one of the relational database and the distributed storage cluster for execution;
  
  purging the first portion of the set of data from the relational database upon the stored first portion of the set of data in the relational database reaching a first age; and
  
  purging the first portion of the set of data from the distributed storage cluster upon the stored first portion of the set of data in the distributed storage cluster reaching a second age, wherein the first age and the second age are different, and wherein the first age is lower than the second age.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The non-transitory computer-readable medium of claim 11, further comprising:
    - selectively writing a second portion of the set of data to only the relational database, based on a predefined relationship between a data type corresponding to the second portion of the set of data and the relational database.
  - 13. The non-transitory computer-readable medium of claim 11, wherein selecting one of the relational database and the distributed storage cluster for processing the query comprises:
    - upon determining that the query is for at least one of real-time analysis or control of the manufacturing environment, selecting the relational database for processing the query; and
      
      upon determining that the query is for at least one of reporting or simulation of the manufacturing environment, selecting the distributed storage cluster for processing the query.
  - 14. The non-transitory computer-readable medium of claim 11, wherein writing the first portion of the set of data to the distributed storage cluster comprises replicating the first portion of the set of data across the plurality of storage nodes in the distributed storage cluster.
  - 15. The non-transitory computer-readable medium of claim 12, further comprising:
    - replicating the second portion of the set of data; and
      
      writing the replicated second portion of the set of data to the distributed storage cluster.

16. A manufacturing system comprising:
- a plurality of tools for manufacturing one or more semi-conductor devices;
  
  a first storage system comprising a relational database;
  
  a second storage system comprising a distributed storage cluster, the distributed storage cluster comprising a plurality of storage nodes in a distributed computing environment;
  
  at least one processor; and
  
  a memory containing a program that, when executed by the at least one processor, performs an operation comprising;
  
  receiving a set of data from the plurality of tools;
  
  separately writing a first portion of the set of data to both the relational database and the distributed storage cluster;
  
  upon receiving a query to be processed from the set of data;
  
  analyzing the query to determine an application from which the query was received;
  
  selecting one of the relational database and the distributed storage cluster for processing the query, based on a mapping rule that defines a predefined relationship between a type of the application from which the query was received and the selected one of the relational database and the distributed storage cluster, wherein the mapping rule specifies that queries from applications related to real-time operations are to be processed by the relational database; and
  
  submitting the query to the selected one of the relational database and the distributed storage cluster for execution;
  
  purging the first portion of the set of data from the relational database upon the stored first portion of the set of data in the relational database reaching a first age; and
  
  purging the first portion of the set of data from the distributed storage cluster upon the stored first portion of the set of data in the distributed storage cluster reaching a second age, wherein the first age and the second age are different, and wherein the first age is lower than the second age.
- View Dependent Claims (17, 18)
- - 17. The manufacturing system of claim 16, wherein a data management application directly writes the first portion of the set of data to both the relational database and the distributed storage cluster, and wherein the operation further comprises:
    - selectively writing a second portion of the set of data to only the relational database.
  - 18. The manufacturing system of claim 16, wherein selecting one of the relational database and the distributed storage cluster for processing the query comprises:
    - upon determining that the query is for at least one of real-time analysis or control of the manufacturing environment, selecting the relational database for processing the query; and
      
      upon determining that the query is for at least one of reporting or simulation of the manufacturing environment, selecting the distributed storage cluster for processing the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Applied Materials Incorporated
Original Assignee
Applied Materials Incorporated
Inventors
Samantaray, Jamini, Sutrave, Pramode Kumar, Patel, Jigar Bhadriklal, Bowyer, Thomas, Ramalingam, Muthukumar
Primary Examiner(s)
Mofiz, Apu M
Assistant Examiner(s)
Nguyen, Cindy

Application Number

US14/808,730
Publication Number

US 20160092510A1
Time in Patent Office

1,705 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/24542   Plan optimisation

G06F 16/2455   Query execution

G06F 16/285   Clustering or classification

Optimized storage solution for real-time queries and data modeling

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Optimized storage solution for real-time queries and data modeling

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links