Harvesting file system metsdata
First Claim
1. A computer program product comprising one or more non-transitory computer readable storage media storing instructions translatable by one or more processors to perform:
- accessing network file systems on one or more machines, the network file systems operate under various file system protocols at one or more physical locations in a network environment, the network file systems containing files and directories being managed by a network file system management system, wherein the network file system management system comprises a plurality of discrete components and a plurality of queues, wherein the plurality of discrete components comprises a first component, a second component, and a third component, wherein the plurality of queues comprises a first queue and a second queue, and wherein the first component, the second component, and the third component operate coordinate by way of the first queue and the second queue;
the first component collecting file system metadata of the files and directories from the network file systems;
the first component transforming the collected file system metadata in real time into metadata records having a common representation;
the first component storing the metadata records in the first queue, each metadata record of the metadata records comprises a set of attributes associated with a file or directory residing on the network file systems;
for each metadata record read from the first queue, the second component synthesizing or calculating one or more attributes from the collected file system metadata to improve the set of attributes associated with the metadata record;
the second component storing the metadata record with the improved set of attributes in the second queue;
the third component reading one or more of the metadata records from the second queue; and
the third component storing the one or more of the metadata records in a metadata repository according to a scheduling heuristic, the metadata repository being accessible by the network file system management system over a network connection.
2 Assignments
0 Petitions
Accused Products
Abstract
A harvester is disclosed for harvesting metadata of managed objects (files and directories) across file systems which are generally not interoperable in an enterprise environment. Harvested metadata may include 1) file system attributes such as size, owner, recency; 2) content-specific attributes such as the presence or absence of various keywords (or combinations of keywords) within documents as well as concepts comprised of natural language entities; 3) synthetic attributes such as mathematical checksums or hashes of file contents; and 4) high-level semantic attributes that serve to classify and categorize files and documents. The classification itself can trigger an action in compliance with a policy rule. Harvested metadata are stored in a metadata repository to facilitate the automated or semi-automated application of policies.
-
Citations
18 Claims
-
1. A computer program product comprising one or more non-transitory computer readable storage media storing instructions translatable by one or more processors to perform:
-
accessing network file systems on one or more machines, the network file systems operate under various file system protocols at one or more physical locations in a network environment, the network file systems containing files and directories being managed by a network file system management system, wherein the network file system management system comprises a plurality of discrete components and a plurality of queues, wherein the plurality of discrete components comprises a first component, a second component, and a third component, wherein the plurality of queues comprises a first queue and a second queue, and wherein the first component, the second component, and the third component operate coordinate by way of the first queue and the second queue; the first component collecting file system metadata of the files and directories from the network file systems; the first component transforming the collected file system metadata in real time into metadata records having a common representation; the first component storing the metadata records in the first queue, each metadata record of the metadata records comprises a set of attributes associated with a file or directory residing on the network file systems; for each metadata record read from the first queue, the second component synthesizing or calculating one or more attributes from the collected file system metadata to improve the set of attributes associated with the metadata record; the second component storing the metadata record with the improved set of attributes in the second queue; the third component reading one or more of the metadata records from the second queue; and the third component storing the one or more of the metadata records in a metadata repository according to a scheduling heuristic, the metadata repository being accessible by the network file system management system over a network connection. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for harvesting file system metadata, comprising:
-
accessing network file systems on one or more machines, the network file systems operate under various file system protocols at one or more physical locations in a network environment, the network file systems containing files and directories being managed by a network file system management system, wherein the network file system management system comprises a plurality of discrete components and a plurality of queues, wherein the plurality of discrete components comprises a first component, a second component, and a third component, wherein the plurality of queues comprises a first queue and a second queue, and wherein the first component, the second component, and the third component operate coordinate by way of the first queue and the second queue; the first component collecting file system metadata of the files and directories from the network file systems; the first component transforming the collected file system metadata in real time into metadata records having a common representation; the first component storing the metadata records in the first queue, each metadata record of the metadata records comprises a set of attributes associated with a file or directory residing on the network file systems; for each metadata record read from the first queue, the second component synthesizing or calculating one or more attributes from the collected file system metadata to improve the set of attributes associated with the metadata record; the second component storing the metadata record with the improved set of attributes in the second queue; the third component reading one or more of the metadata record from the second queue; and the third component storing the one or more of the metadata record in a metadata repository according to a scheduling heuristic, the metadata repository being accessible by the network file system management system over a network connection. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system for harvesting file system metadata, the system comprising one or more non-transitory computer readable storage media storing instructions implementing a network file system management system, the network file system management system comprising a plurality of discrete components and a plurality of queues, wherein the plurality of queues comprises a first queue and a second queue, and wherein the plurality of discrete components comprises:
-
a grazer module executing on the system and capable of; accessing network file systems on one or more machines, the network file systems operate under various file system protocols at one or more physical locations in a network environment, the network file systems containing files and directories being managed by the network file system management system; collecting file system metadata of the files and directories from the network file systems; transforming the collected file system metadata in real time into metadata records having a common representation; and placing the metadata records in the first queue, each metadata record of the metadata records comprises a set of attributes associated with a file or directory residing on the network file systems; an improver module executing on the system and capable of; for each metadata record read from the first queue synthesizing or calculating one or more attributes from the collected file system metadata to improve the set of attributes associated with the metadata record; and placing the metadata record with the improved set of attributes in the second queue; and a populator module executing on the system and capable of; reading one or more of the metadata records from the second queue; and storing the one or more of the metadata records with the improved set of attributes in a metadata repository according to a scheduling heuristic, the metadata repository being accessible by the network file system management system over a network connection. - View Dependent Claims (15, 16, 17, 18)
-
Specification