Data analysis in distributed data processing system
First Claim
1. A distributed data processing system for analyzing data with a plurality of computers in a distributed environment, comprising:
- management database storage means for storing a resource management database that associates identifiers of resource files, which are data and/or program files, for analytical processes with actual storage locations of the resource files, the identifiers being determined to identify the resource files uniquely in the distributed data processing system; and
process execution means, responsive to a request for execution of a specific analytical process whose necessary resource files are identified by one of the identifiers, for selecting a computer that ranks first in terms of the number of necessary resource files stored therein and executing on the selected computer the requested analytical process by using the resource files whose storage locations are retrieved from the resource management database in said management database storage means.
1 Assignment
0 Petitions
Accused Products
Abstract
A distributed data processing system uniformly managing distributed data and program files. A resource management database associates identifiers of resource files to be used in analytical processes with their respective storage locations. The identifiers should be unique in the distributed environment, so that all resource files will be uniquely distinguished from each other by using their identifiers. When requesting execution of a particular analytical process, the identifiers specify resource files necessary for the process. A process execution unit in the system first creates a work area, which can also be used to store intermediate data files created during the execution. The resource management database is consulted to know the locations of necessary resource files, and creates links in the work area to reach those files. The process execution unit executes the requested analytical process while making access to the resource files via the links in the work area.
46 Citations
2 Claims
-
1. A distributed data processing system for analyzing data with a plurality of computers in a distributed environment, comprising:
-
management database storage means for storing a resource management database that associates identifiers of resource files, which are data and/or program files, for analytical processes with actual storage locations of the resource files, the identifiers being determined to identify the resource files uniquely in the distributed data processing system; and
process execution means, responsive to a request for execution of a specific analytical process whose necessary resource files are identified by one of the identifiers, for selecting a computer that ranks first in terms of the number of necessary resource files stored therein and executing on the selected computer the requested analytical process by using the resource files whose storage locations are retrieved from the resource management database in said management database storage means.
-
-
2. A distributed data processing system for analyzing data with a plurality of computers in a distributed environment, comprising:
-
management database storage means for storing a resource management database that associates identifiers of resource files, which are data and/or program files, for analytical processes with actual storage locations of the resource files, the identifiers being determined to identify the resource files uniquely in the distributed data processing system; and
cache management table storage means for storing a cache management table to collect records of cached resource files that have been fetched from remote computers and stored temporarily; and
process execution means, responsive to a request for execution of a specific analytical process whose necessary resource files are identified by one of the identifiers, for selecting a computer that ranks first in terms of the number of necessary resource files stored therein by examining the resource management database and the cache management table, and executing on the selected computer the requested analytical process by using the resource files whose storage locations are retrieved from the resource management database in said management database storage means.
-
Specification