Method and apparatus for the naming of database component files to avoid duplication of files
First Claim
1. In a computer system comprising a CPU, input/output means and memory containing a file system, said file system comprising at least one source file comprising text, a process for generating a database comprising at least one database component file derived from the source files such that one database component file is generated for each unique source file regardless of the number of copies of the source file occurring in a file system, said process comprising the steps of:
- generating a unique name for the database component file, said name generated by concatenating the source file name with a hash value computed according to the contents of the source file, whereby if the contents of the source file changes, the hash value changes and a different database component file name is generated;
searching the file system for a database component file having the same name as the generated database component file name;
if a database component file having the same name as the generated database component file does not exist in the file system, generating a database component file for the source file comprising a listing of symbols and line numbers in the source file where the symbol occurs;
whereby if a database file having the same name as the generated database file exists, a database component file is not generated thereby eliminating the duplication of database component files and the system usage required to write the duplicate file.
1 Assignment
0 Petitions
Accused Products
Abstract
In the method and apparatus of the present invention a file to be added to the database is given a unique name that is dependent upon the contents of the file such that, when the contents of the source file changes, the name of the database component file to be added to the database also changes. Conversely, if two files of the same name have the same information contained therein, the same file name will be generated and the duplication of information in the database is prevented by providing a simple test that checks for the existence of the name of the database file before the generation and addition of the new file to the database. If the file name exists in the database, information is already contained in the database and the file is not generated and added to the database information. Preferably the name of the file is generated by computing a hash value from the contents of the file concatenating the hash value to the name of the source file. Because the source file name is used in conjunction with the hash value to construct the database file name, the hash value does not have to be unique for all files but only for those source files having the same name.
354 Citations
18 Claims
-
1. In a computer system comprising a CPU, input/output means and memory containing a file system, said file system comprising at least one source file comprising text, a process for generating a database comprising at least one database component file derived from the source files such that one database component file is generated for each unique source file regardless of the number of copies of the source file occurring in a file system, said process comprising the steps of:
-
generating a unique name for the database component file, said name generated by concatenating the source file name with a hash value computed according to the contents of the source file, whereby if the contents of the source file changes, the hash value changes and a different database component file name is generated; searching the file system for a database component file having the same name as the generated database component file name; if a database component file having the same name as the generated database component file does not exist in the file system, generating a database component file for the source file comprising a listing of symbols and line numbers in the source file where the symbol occurs; whereby if a database file having the same name as the generated database file exists, a database component file is not generated thereby eliminating the duplication of database component files and the system usage required to write the duplicate file. - View Dependent Claims (2, 3)
-
-
4. In a computer system comprising a CPU, input/output means and memory containing a file system, said file system comprising at least one source file comprising text, an apparatus for generating a database comprising at least one database component file derived from the source files such that one database component file is generated for each unique source file regardless of the number of copies of the source file occurring in the file system, said apparatus comprising:
-
means for generating a unique name for the database component file, said name generated by concatenating the source file name with a hash value computed according to the contents of the source file, whereby if the contents of the source file changes, the hash value changes and a different database component file name is generated; means for searching the file system for a database component file having the same name as the generated database component file name; if a database component file having the same name as the generated database component file name does not exist in the file system, means for generating a database component file for the source file comprising a listing of symbols and line numbers in the source file where the symbol occurs; - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
13. The apparatus according to claim 5, wherein the database component file for a source file is located in a directory separate from a directory where the source file is located.
-
14. The apparatus according to claim 13, wherein the database component file is located in a directory specified in a second directory listing, said second directory listing comprising source file names and the location in the file system of corresponding database component file names.
-
15. The apparatus according to claim 14, wherein the second directory listing is located in a predetermined file referred to when a database component file is generated.
-
16. The apparatus according to claim 4, wherein the hash value is computed according to the sum of the bytes in the database component file.
-
17. The apparatus of claim 7, wherein said means for generating a hash value generates the hash value as the sum of:
-
the hash value generated for the database component file information section comprising the sum of the first predetermined number of bytes in the database component file, source type ID, the major and minor version numbers of the file, line indicator and case indicator and each character comprising the name of the language the source is written in; the hash value generated for the source name section comprising the sum of the value of each character of the file name of the source file; the hash value generated for the referenced file section comprising the sum of the values of the names of the referenced files; the hash value generated for the symbol table section comprising the sum of the values of the characters comprising the symbols in the symbol table; the hash value generated for the semantic table section comprising the sum of the record type ID, line number and semantic tag for each symbol reference; and the hash value generated for the line identification section comprising the sum of the line number, line length, hash value and inactive indicator of each line of text in the source file.
-
-
18. The apparatus according to claim 5, said means for generating an index file further comprising a split function means comprising:
-
means for determining the size of the index file and comparing the size of the index file to be generated to a maximum index file size; if the size of the index file is greater than or equal to the maximum index file size; means for establishing a first sub-directory comprising database component files which have not changed subsequent to a prior generation of the index file and the index file generated prior; means for establishing a second sub-directory comprising database component files which have changed subsequent to a prior generation of the index file; means to control said browser means to generate a new index file only for those source files which have changed subsequent to the prior generation of the index file; whereby the size of the index files generated are maintained below the maximum index file size thereby preventing decrease in processing speed due to the generation of large index files.
-
-
Specification