STORAGE SYSTEM
First Claim
1. A storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, comprising:
- a file server apparatus which includes a de-duplicate processing unit which performs de-duplicate processing to a plurality of files having the same content in a file group stored in the storage area and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area; and
a full-text search processing apparatus which performs a full-text search processing including an index information creation processing to the file group stored in the storage area to create index information and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files,wherein the index information includes keyword occurrence position information in a data body of the file and further includes said de-duplicate group information,wherein the index information is de-duplicated by inhibiting the keyword occurrence position information creation processing performed to the plurality of files having the same content by the full-text search processing apparatus according to a status of the de-duplicate processing to the file group performed by the de-duplicate processing unit of the file server apparatus,wherein the full-text search processing apparatus executes de-duplicate correspondence processing based on the index information,wherein, said full-text search server apparatus responds to the host apparatus by providing search result information comprising information regarding a representative file included in a search result, and further provides, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, said representative file being searched by said de-duplicate correspondence processing,wherein the file server apparatus is coupled between the host apparatus and the storage area, wherein the full-text search processing apparatus is coupled between the host apparatus and the index information, and wherein the host apparatus and the file server apparatus are coupled between the full-text search server apparatus and the storage area, andwherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to a storage system including a de-duplicate function and a full-text search function or the like, and reduces an amount of index information about full-test search to save storage resource. In this system, a storage apparatus includes a processing unit for de-duplicating a plurality of files having the same content regarding a file group of data inputted/outputted through a host apparatus. A full-text search processing server performs a full-text search processing to the file group and includes a processing unit for causing the full-text search processing to correspond to de-duplicate. An index information creation processing performed to a plurality of target files having the same content by the full-text search processing unit is inhibited according to a status of de-duplicate to the file group by the processing unit. Thereby, the amount of index information can be reduced.
-
Citations
9 Claims
-
1. A storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, comprising:
-
a file server apparatus which includes a de-duplicate processing unit which performs de-duplicate processing to a plurality of files having the same content in a file group stored in the storage area and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area; and a full-text search processing apparatus which performs a full-text search processing including an index information creation processing to the file group stored in the storage area to create index information and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files, wherein the index information includes keyword occurrence position information in a data body of the file and further includes said de-duplicate group information, wherein the index information is de-duplicated by inhibiting the keyword occurrence position information creation processing performed to the plurality of files having the same content by the full-text search processing apparatus according to a status of the de-duplicate processing to the file group performed by the de-duplicate processing unit of the file server apparatus, wherein the full-text search processing apparatus executes de-duplicate correspondence processing based on the index information, wherein, said full-text search server apparatus responds to the host apparatus by providing search result information comprising information regarding a representative file included in a search result, and further provides, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, said representative file being searched by said de-duplicate correspondence processing, wherein the file server apparatus is coupled between the host apparatus and the storage area, wherein the full-text search processing apparatus is coupled between the host apparatus and the index information, and wherein the host apparatus and the file server apparatus are coupled between the full-text search server apparatus and the storage area, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus.
-
-
2. A storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, comprising:
-
a first storage apparatus which is connected to the host apparatus via a network and receives input/output of the files from the host apparatus; a second storage apparatus which is connected to the first storage apparatus and stores the files in a storage area of the second storage apparatus itself according to access from the first storage apparatus, wherein the first storage apparatus includes a file server apparatus including a de-duplicate processing unit which performs a processing for de-duplicating a plurality of files having the same content regarding a file group stored in the storage area of the second storage apparatus and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area; and a full-text search server apparatus which performs a full-text search processing including an index information creation processing to a file group stored in the storage area of the second storage apparatus to create index information, which performs a search processing for performing a keyword search for the index information in response to an instruction from the host apparatus to return search result information back to the host apparatus, and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files, wherein the index information includes keyword occurrence position information in a data body of the file and further includes said de-duplicate group information, wherein the full-text search server apparatus inhibits the keyword occurrence position information creation processing to a plurality of files having the same content according to a status of the de-duplicate processing of the file group performed by the de-duplicate processing unit of the file server apparatus, except for a representative file, wherein the full-text search server apparatus executes de-duplicate correspondence processing based on the index information, wherein, said full-text search server apparatus responds to the host apparatus by providing search result information comprising information regarding a representative file included in a search result, and further provides, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, said representative file being searched by said de-duplicate correspondence processing, wherein the file server apparatus is coupled between the host apparatus and the storage area, wherein the full-text search processing apparatus is coupled between the host apparatus and the index information, and wherein the host apparatus and the file server apparatus are coupled between the full-text search server apparatus and the storage area, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus. - View Dependent Claims (4)
-
-
3. A method of operating a storage system which stores files of data inputted in/outputted from a host apparatus in a storage area, the storage system including a first storage apparatus which is connected to the host apparatus via a network and receives input/output of the files from the host apparatus, a second storage apparatus which is connected to the first storage apparatus and stores the files in a storage area of the second storage apparatus itself according to access from the first storage apparatus, wherein the first storage apparatus includes a file server apparatus including a de-duplicate processing unit, and a full-text search server apparatus, the method comprising:
-
performing a processing with the file server apparatus for unifying a plurality of files having the same content to a single instance regarding a file group stored in the storage area of the second storage apparatus and which creates de-duplication information which is stored in the storage area, and which comprises control information indicating a de-duplicate status of the file server apparatus including presence/absence of files having the same content and information of a representative file in the storage area, performing, with the full-text search server apparatus a full-text search processing including an index information creation processing to a file group stored in the storage area of the second storage apparatus to create index information, and which performs a search processing for performing a keyword search for the index information in response to an instruction from the host apparatus to return a search result information back to the host apparatus, and which performs a de-duplicate group information creation processing to create de-duplicate group information based on said de-duplication information, wherein the de-duplicate group information indicates a group of files having the same content, representative files in the group of files and a link between the files, wherein the index information includes a keyword occurrence position in a data body of the file, and further includes said de-duplicate group information, inhibiting, with the full-text search server apparatus, the keyword occurrence position information creation processing regarding a plurality of files having the same content according to a status of the unification of the file group to the single instance performed by the de-duplicate processing unit of the file server apparatus except for a representative file, executing, with the full-text search server apparatus, de-duplicate correspondence processing based on the index information, and responding, with said full-text search server apparatus, to the host apparatus by providing search result information comprising information regarding the representative file included in a search result and further providing, by referring to said de-duplicate group information, information of another file which belongs to a de-duplicate group of the representative file and which has the same content as said representative file, wherein the file server apparatus is coupled between the host apparatus and the storage area, wherein the full-text search processing apparatus is coupled between the host apparatus and the index information, and wherein the host apparatus and the file server apparatus are coupled between the full-text search server apparatus and the storage area, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is a separate function from the de-duplicate processing performed by the de-duplicate processing unit of the file server apparatus, and wherein the de-duplicate correspondence processing performed by the full text search processing apparatus is controlled by the de-duplicate information created by the de-duplicate processing unit of the file server apparatus. - View Dependent Claims (5, 6, 7, 8, 9)
-
Specification