V0.6.7 - Apr 26, 2003 - If the user requested DatesEqual, add mtime to the equivalence class. V0.6.6 - Apr 26, 2003 - InodeOfFile used to be used in Md5SumOf (could stand on its own), was created and cleared in IndexFile, and heavily in LinkFiles. Md5SumOf now creates an entry on demand, and the create and clear are moved entirely to LinkFiles. In fact, we don't clear it at all now. V0.6.5 - Apr 26, 2003 - Start using size+uid+gid+mode as an equivalence class instead of size in the InodesOfSize array. All the same tests are performed as before, only we have fewer inodes to which the current one is compared. - Stop reconstructing the InodesOfSize{size} on every inode. Just replace an entry in this array if it becomes necessary. - Quiet the "Tried to link identical Inodes" message. V0.6.4 - Apr 16, 2003 - Brown paper bag time. I compared a new inode to each of the inodes of its size, trying to find a link. The problem was, if it didn't match any of them, I failed to add it to the list of inodes of that size so it might match future inodes. For example, if I have two pairs of identical files, the first pair gets linked, the second does not. The regression test was updated to check that this works in the future. My sincere thanks to Martin Sheppard and Milton Yates at csiro.au for debugging, finding, and sending in a flawless fix for, this bug. V0.6.3 - Mar 9, 2003 - Reasonably big change; we're now processing files immediately as they're read from disk rather than waiting until everyone's read into memory. I'm hoping this will allow one to actually make some headway even in the case where there's a huge number of files. I also suspect it'll go faster as we're down from order ~1.5x num_files to 1x num_files. This loses a bit of disk cache locality in processing the nodes, but the lack of seeks and the fact that we don't have to wait until everything's read in to make some progress should more than make up for that. - Because of the above, I no longer figure out how many nodes are solitary versus multiple. - Minor fixes. V0.6.2 - Feb 26, 2003 - Slight modification. All calculated sums get written to KnownMd5sums and NewMd5sums. We use KnownMd5sums for all internal work, NewMd5sums is only used for appending new sums to the cache at the end. V0.6.1 - Feb 22, 2003 - Break md5sums into known and new md5sums. Known sums came from the cache and therefore don't need to be written out. new sums were calculated on this run and are appended to cache at the end. - Minor typos and fixes. V0.6.0 - Dec 9, 2002 - Cleanups of a stable 0.5.9. Removed a few variables. Old debugging code removed. Move \n into Debug. - LinkInodes doesn't call LinkFiles if ActuallyLink=no any more (there's a mini version embedded in LinkInodes now that does the Debug prints). V0.5.9 - Dec 9, 2002 - Changed Inodespec storage format from slash delimited string to packed SLSSSLLL format. Runtime peak memory for a 219800 file run went from 87.8M to 78.8M; 10% memory savings. Informal numbers show it about 15% faster as well. - Wow. Instead of loading InodeOfFile during the initial file scan, I leave it blank until we've discarded solitary inodes, and then I load it with _just_ the files and inodes of the currently-being-worked-on size. This brings the peak memory use for that same 219800 file run down to 41.8M. Woah. - For reference, given that v0.5.6 needed 1.08x ram for v0.5.7, v0.5.6 would have needed 94.8M. We've saved 56% of our peak ram requirements. - On a P3-1500, I can process 219800 files (whose directory entires are in disk cache and that are already linked) in 80 seconds. 2,747 files/second. - New regression tests, including a full link of two copies of a kernel source tree and a diff afterwards. - The truly verbose debugs show garbage if they try to print md5sums or inodespecs, sorry. I'm guessing I'm the only person that sees them anyways. V0.5.8 - Nov 30, 2002 - Print a reasonably accurate estimate of how much space would have been saved on dry runs (-a turned off). - Slightly restructure LinkInodes to reduce code repetition. V0.5.7 - Nov 27, 2002 - Don't use IndexedFiles{File}=0 test to guarantee unique files anymore, use defined(@InodeOfFile{File}) which we have already. Saves 8% of memory usage. :-) V0.5.6 - Nov 24, 2002 - Use seperate cache for every user - safer. - Ignore files we can't stat for some reason - Added regression test, to be run on every new version V0.5.5 - Nov 22, 2002 - Added code overview at the top. - Show the size we're working on at each new link (if size has changed from last time). - Ignore blank md5sums. - Discard the md5sum of an inode if we perform the last unlink on that inode. V0.5.4 - Nov 20, 2002 - Slightly different array syntax, per Ross Carlson V0.5.3 - Nov 17, 2002 - Stop using :::: as a separator between the filenames in FilesOfInode; make the FilesOfInode values real arrays. V0.5.2 - Nov 4, 2002 - Discard solitary inodes early to (theoretically) save memory (note that perl (5, at least) doesn't actually return memory to the OS if the app undef's it). V0.5.1 - Nov 3, 2002 - IndexFile function to load all arrays. - Load md5sum cache late V0.5 - Nov 3, 2002 - Freedups has been rewritten in perl. - First perl release with the following features: - (shared) md5 checksum cache - Read filenames in and stat them, storing inode info in internal arrays. - Do comparison of _inodes_, not filenames. - If a given size has a single inode, discard it as there's no chance of linking. V0.4 - May 6, 2001 - v0.3 and below were spending a _lot_ of time forking basename, even when we didn't need to test for basename. By removing that and grouping files with identical md5sums together, it processes large numbers of files in about a tenth of the time. It does need to read all the files now, some twice, but it's worth it for the speedup. V0.3 - Mar 11, 2001 - Handles command line parameters now. Setting options via environment variables works for the moment, but will be removed in a future version. - Updated documentation. List of apps, more verbose answers to questions. - GPL text block added. - Other minor fixes and cleanups. V0.2.1 - Mar 02, 2001 - Added README and Changelog to package - Don't debug by default in shipping version. - Clean out more debugging code. - minor code cleanups - Add examples to Usage output. - Use mktemp if available for temporary signature file. V0.2 - Feb 23, 2001 - Removal of a lot of forks and simplification of tests. - More equivalency testing done in find's output. - Link to the older of the two files or the file with the most links. V0.1 - Feb 19, 2001 - Basic search and link functionality - Environment variables available: - ACTUALLYLINK=YES #Just reports on potential savings if anything but YES. - VERBOSE=YES #Show directory listing and wait before linking if YES. - CHECKDATE=YES #Modified date and time must be equal to be considered for linking if YES. - FILENAMESEQUAL=YES #Files must have the same name (in different directories to be considered for linking - MINSIZE=size #Files must be larger than this size (in bytes) to be considered for linking. - Not publicly released.