Storage system
First Claim
1. A storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, comprising:
- a file server apparatus which includes a de-duplicate processing unit which performs de-duplicate processing to a plurality of files having the same content in a file group stored in the storage area and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area; and
a full-text search processing apparatus which performs a full-text search processing including an index information creation processing to the file group stored in the storage area to create index information and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files,wherein the index information includes keyword occurrence position information in a data body of the file and further includes said de-duplicate group information,wherein the index information is de-duplicated by inhibiting the keyword occurrence position information creation processing performed to the plurality of files having the same content by the full-text search processing apparatus according to a status of the de-duplicate processing to the file group performed by the de-duplicate processing unit of the file server apparatus,wherein the full-text search processing apparatus executes de-duplicate correspondence processing based on the index information,wherein, said full-text search server apparatus responds to the host apparatus by providing search result information comprising information regarding a representative file included in a search result, and further provides, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, said representative file being searched by said de-duplicate correspondence processing, andwherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a storage system including a de-duplicate function and a full-text search function or the like, and reduces an amount of index information about full-test search to save storage resource. In this system, a storage apparatus includes a processing unit for de-duplicating a plurality of files having the same content regarding a file group of data inputted/outputted through a host apparatus. A full-text search processing server performs a full-text search processing to the file group and includes a processing unit for causing the full-text search processing to correspond to de-duplicate. An index information creation processing performed to a plurality of target files having the same content by the full-text search processing unit is inhibited according to a status of de-duplicate to the file group by the processing unit. Thereby, the amount of index information can be reduced.
-
Citations
10 Claims
-
1. A storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, comprising:
-
a file server apparatus which includes a de-duplicate processing unit which performs de-duplicate processing to a plurality of files having the same content in a file group stored in the storage area and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area; and a full-text search processing apparatus which performs a full-text search processing including an index information creation processing to the file group stored in the storage area to create index information and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files, wherein the index information includes keyword occurrence position information in a data body of the file and further includes said de-duplicate group information, wherein the index information is de-duplicated by inhibiting the keyword occurrence position information creation processing performed to the plurality of files having the same content by the full-text search processing apparatus according to a status of the de-duplicate processing to the file group performed by the de-duplicate processing unit of the file server apparatus, wherein the full-text search processing apparatus executes de-duplicate correspondence processing based on the index information, wherein, said full-text search server apparatus responds to the host apparatus by providing search result information comprising information regarding a representative file included in a search result, and further provides, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, said representative file being searched by said de-duplicate correspondence processing, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus.
-
-
2. A storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, comprising:
-
a first storage apparatus which is connected to the host apparatus via a network and receives input/output of the files from the host apparatus; a second storage apparatus which is connected to the first storage apparatus and stores the files in a storage area of the second storage apparatus itself according to access from the first storage apparatus, wherein the first storage apparatus includes a file server apparatus including a de-duplicate processing unit which performs a processing for de-duplicating a plurality of files having the same content regarding a file group stored in the storage area of the second storage apparatus and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area; and a full-text search server apparatus which performs a full-text search processing including an index information creation processing to a file group stored in the storage area of the second storage apparatus to create index information, which performs a search processing for performing a keyword search for the index information in response to an instruction from the host apparatus to return search result information back to the host apparatus, and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files, wherein the index information includes keyword occurrence position information in a data body of the file and further includes said de-duplicate group information, wherein the full-text search server apparatus inhibits the keyword occurrence position information creation processing to a plurality of files having the same content according to a status of the de-duplicate processing of the file group performed by the de-duplicate processing unit of the file server apparatus, except for a representative file, wherein the full-text search server apparatus executes de-duplicate correspondence processing based on the index information, wherein, said full-text search server apparatus responds to the host apparatus by providing search result information comprising information regarding a representative file included in a search result, and further provides, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, said representative file being searched by said de-duplicate correspondence processing, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus. - View Dependent Claims (3)
-
-
4. A method of operating a storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, the storage system including a first storage apparatus which is connected to the host apparatus via a network and receives input/output of the files from the host apparatus, a second storage apparatus which is connected to the first storage apparatus and stores the files in a storage area of the second storage apparatus itself according to access from the first storage apparatus, wherein the first storage apparatus includes a file server apparatus including a de-duplicate processing unit, and a full-text search server apparatus, the method comprising:
-
performing a processing with the file server apparatus for unifying a plurality of files having the same content to a single instance regarding a file group stored in the storage area of the second storage apparatus and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area, performing, with the full-text search server apparatus a full-text search processing including an index information creation processing to a file group stored in the storage area of the second storage apparatus to create index information, and which performs a search processing for performing a keyword search for the index information in response to an instruction from the host apparatus to return a search result information back to the host apparatus, and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files, wherein the index information includes a keyword occurrence position in a data body of the file, and further includes said de-duplicate group information, inhibiting, with the full-text search server apparatus, the keyword occurrence position information creation processing regarding a plurality of files having the same content according to a status of the unification of the file group to the single instance performed by the de-duplicate processing unit of the file server apparatus except for a representative file, executing, with the full-text search server apparatus, de-duplicate correspondence processing based on the index information, and responding, with said full-text search server apparatus, to the host apparatus by providing search result information comprising information regarding the representative file included in a search result and further providing, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
Specification