Up one level
ChangeLog 7474 Apr 26 2003
README 11711 Feb 13 2004
README-freedups-v0.5.pl 460 Jan 2 2004
archives Jan 2 2004
favicon.ico 20 Aug 28 2005
filelist.html
freedups 13612 May 6 2001
freedups-0.4-0.noarch.rpm 12622 May 9 2001
freedups-0.4-0.src.rpm 13193 May 9 2001
freedups-0.4.spec 1747 May 9 2001
freedups-0.4.tar.gz 9785 May 9 2001
freedups-0.5.9-0.noarch.rpm 19126 Dec 9 2002
freedups-0.5.9-0.src.rpm 19831 Dec 9 2002
freedups-0.5.9.spec 1991 Dec 9 2002
freedups-0.5.9.tar.gz 16017 Dec 9 2002
freedups-0.6.0-0.noarch.rpm 18885 Dec 9 2002
freedups-0.6.0-0.src.rpm 19540 Dec 9 2002
freedups-0.6.0.spec 2073 Dec 9 2002
freedups-0.6.0.tar.gz 15670 Dec 9 2002
freedups-0.6.14-0.noarch.rpm 21735 Mar 14 2004
freedups-0.6.14-0.src.rpm 22420 Mar 14 2004
freedups-0.6.14.spec 2157 Mar 14 2004
freedups-0.6.14.tar.gz 18454 Mar 14 2004
freedups-regression-test 6042 Mar 14 2004
freedups-v0.6.0.pl 32210 Dec 9 2002
freedups-v0.6.1.pl 33221 Feb 22 2003
freedups-v0.6.10.pl 35898 Apr 27 2003
freedups-v0.6.11.pl 35921 Apr 27 2003
freedups-v0.6.12.pl 36421 Apr 27 2003
freedups-v0.6.13.pl 37208 May 3 2003
freedups-v0.6.14.pl 37783 Mar 14 2004
freedups-v0.6.2.pl 33125 Feb 26 2003
freedups-v0.6.3.pl 32920 Mar 9 2003
freedups-v0.6.4.pl 33339 Apr 16 2003
freedups-v0.6.5.pl 33692 Apr 26 2003
freedups-v0.6.6.pl 33689 Apr 26 2003
freedups-v0.6.7.pl 34007 Apr 26 2003
freedups-v0.6.8.pl 33963 Apr 26 2003
freedups-v0.6.9.pl 35425 Apr 27 2003
freedups.pl 37783 Mar 14 2004
freedups.sh 13612 May 6 2001
freedups.spec 2157 Mar 14 2004
freedups.v0.4 13612 May 6 2001
index.html
internal-gopher-menu 29 Aug 28 2005
internal-gopher-unknown 32 Aug 28 2005

Boldfaced directories have been collapsed into one listing. Click on them to see their contents.


README

	Freedups searches through the directories you specify.  When it
finds two identical files, it hard links them together.  Now the two or
more files still exist in their respective directories, but only one copy
of the data is stored on disk; both directory entries point to the same
data blocks.
	This allows you to reclaim space on your drive.  It's that 
simple.  Run it every night from a cron job.


Why you'd want to use it:
	- You have multiple copies of a source code tree on your system.
Freedups will link any identical files together and ignore any files
that changed between versions.
	- You have multiple copies of the file COPYING in /usr/doc or
/usr/share/DOC
	- Depending on your system, the following might be good places
to try linking (size in parentheses is amount saved on a very basic
RedHat 7.3 install; you'll probably get even more savings):
freedups /lib/kbd					(463K)
freedups /usr/doc /usr/share/doc
freedups /usr/src/linux*
freedups /usr/src/pcmcia-cs*
freedups /usr/share					(8.6M)
freedups /usr/lib					(97K)
freedups /usr/man /usr/share/man
freedups /usr/share/locale /etc/locale			(652K)
freedups /usr/share/scrollkeeper /var/lib/scrollkeeper	(719K)
	- Directories holding files that are only read are good
candidates.
	You might also find some space savings by deleting the
/usr/share/locale/country_code/LC_MESSAGES/*.mo files in country_codes
you don't need.


Things to watch out for:
	- You'll need to use the _full path_, starting with /, to any
files or directories you want freedups to search.  If you don't, you'll
likely get an error like "cannot stat file".
	- Remember that you now have multiple directory entries pointing
at one block of data.  Depending on what editor you use, when you change
one of the files you may be changing the others as well.  See below for 
a list of applications and whether they automatically handle hardlinks
or not.
	- For the above reason, you probably don't want to create links
to any backup copies on the drive.
	- If the files are on different partitions, it's not possible to
create a hardlink between them.  Freedups handles this gracefully.
	- Directories holding files that might be written to are
generally not good candidates.  Similarly, avoid directories holding
security-related files.  /etc is a bad choice on both counts.
	- If you run freedups without the --datesequal=yes option,
freedups may link files with different modification times together.  If
you later use "rpm -Va" (or the equivalent debian system verify
command), it may report that the timestamps on some files have changed.
If this is _all_ that has changed, this is a cosmetic problem only.  For
example, the following is cosmetic and not indicative of a modified
file:

.......T   /usr/share/automake/COPYING


Can I run this and just see what would have been linked together without
modifying anything?
	Sure.  In fact, unless you put -a on the command line, that's
_all_ freedups will do.  By default, it won't actually do anything,
it'll just tell you what the approximate space savings would be.


Does this really save any space?
	It really depends on whether you have duplicate files on your
filesystem or not.  I've personally recovered ~3G on my main drive from
hardlinking identical files in the various kernel trees I have there.
One user reports saving ~2G simply from hardlinking identical files
downloaded by a p2p file sharing program.


Does this slow down the system like the drive compression programs?
	No.  No files are compressed with this tool.  It only instructs
the filesystem to keep one copy of two or more identical files and have
all their directory entries point at the sole copy of the actual file
data.  In fact, for certain operations (such as using diff between two
freedup'd directory trees), the system runs much, much faster.
	File reads should _not_ become slower.
	Running freedups can take quite a while, but it can certainly
be run off-hours or when the system is generally idle.  It can be run
under nice to give other programs priority.


Do I have to run this as root?
	Not at all.  As long as you own the files, freedups runs just
fine as a normal user.


What has to be true for two files to get linked together?
	- They have to be files (i.e. not character or block devices, no
pipes, no directories, no symlinks).
	- They have to have at least one byte.  I don't want to link
all 0 byte files on the system together.
	- They have to have the same size.
	- They have to have the same user owner, group owner and mode.
Skirting this requirement would raise _serious_ security considerations.
If you want to link two files that currently differ in owner or mode, 
use chown or chmod to make their owners or modes identical and re-run
freedups.
	- They have to be readable by the current user.
	- The contents of the files have to be identical.
	- Optionally (--minsize=1000), the files have to be larger than
the given number of bytes.
	- Optionally (--datesequal=yes), the files have to have identical
modification timestamps.
	- Optionally (--filenamesequal=yes), the filenames have to be 
identical (in different directories, obviously).
	- They have to be on the same partition.

	- That partition must support hardlinks.  Ext2, ext3 and
reiserfs do.  I'm pretty sure fat/vfat/msdos do not.  If you know whether
another linux filesystem supports hardlinks or not, please let me know.


I think I have a bunch of files that should be linked together, but
freedups doesn't link them.  Why not?
	Walk through the above list of criteria for a given pair of
files in question.  Which one fails?
	To examine a pair of files, look at the output from:

ls -ali firstfile secondfile

	which looks like:

2097229 -rw-rw-r--    1 wstearns wstearns        4 Mar 11 16:09 firstfile
2097673 -rw-------    1 nobody   nobody          5 Mar 11 16:10 secondfile

	The columns are: inode number, file mode, number of links to
this inode, user owner, group owner, file size, modification date,
modification time, and filename.  The above two files wouldn't be linked
because their modes are different, they're owned by different users,
they're owned by different groups, and have different sizes (so must
have different contents).  Depending on options, they may also be
disqualified because their modification times and filenames are
different.
	That said, if you do come up with files that legitimately should
be linked but aren't, please email me so I can fix freedups.


Can this be safely run more than once?
	Definitely.  Freedups is smart enough to recognize that two
files are already linked together and just moves on to the next pair.
	For this reason, running it twice on the exact same set of
files won't save any more space.

Are there different ways to do this?
	Sure.
	- Rewrite this in a more efficient language.
	- When copying a directory tree, hard link the files during the
copy:

cp -av --link linux-2.1.anything.orig linux-2.1.anything

	Many thanks to the Kernel FAQ and Janos Farkas for that trick.

	- Delete truly unneeded files
	- Use CVS or Bitkeeper; the latter, at least, can save
substantial amounts of space.


How can I test that the program is working?
	Try the following:
[wstearns@sparrow wstearns]$ cd /tmp
[wstearns@sparrow /tmp]$ mkdir duptest
[wstearns@sparrow /tmp]$ cd duptest
[wstearns@sparrow duptest]$ echo Hi there. >test1
[wstearns@sparrow duptest]$ cp -p test1 test2
[wstearns@sparrow duptest]$ ls -ali test1 test2
1885113 -rw-rw-r--    1 wstearns wstearns       10 Feb 28 00:55 test1
1885114 -rw-rw-r--    1 wstearns wstearns       10 Feb 28 00:55 test2

	Note the different inode numbers - the total space used by these
two files is 20 bytes (actually 2 filesytem blocks, but that's a detail).

[wstearns@sparrow duptest]$ freedups ./test1 ./test2
Options chosen: None 
About to check for links in " ./test1 ./test2"
10: Would have linked ./test2 and ./test1
Total space would have saved: 10 (An overestimate if more than two files would have been linked together.)

	By default, it just reports what the savings would have been.

[wstearns@sparrow duptest]$ freedups -a ./test1 ./test2
Options chosen: ActuallyLink 
About to check for links in " ./test1 ./test2"
10 Linked ./test2 and ./test1
Total space saved: 10 (Small risk of overcounting space saved if linked files have different times.)
[wstearns@sparrow duptest]$ ls -ali test1 test2
1885114 -rw-rw-r--    2 wstearns wstearns       10 Feb 28 00:55 test1
1885114 -rw-rw-r--    2 wstearns wstearns       10 Feb 28 00:55 test2

	Now both files share a single inode, so all but one copy is freed
and the free space rises accordingly.
	For more examples, run freedups with the "-h" help option.


Application list
	This list of applications shows whether they handle unlinking a
file before saving to it.  I made an attempt on each to find an option
that allows one to change this behavior, but may not have found one.
	Contributions and corrections are gratefully accepted.  Here's
how to test:

[wstearns@sparrow wstearns]$ cd /tmp
[wstearns@sparrow /tmp]$ mkdir linktest
[wstearns@sparrow /tmp]$ cd linktest
[wstearns@sparrow linktest]$ echo Hi there >test1
[wstearns@sparrow linktest]$ ln -f test1 test2
[wstearns@sparrow linktest]$ ls -ali test*
1885112 -rw-rw-r--    2 wstearns wstearns        9 Mar  5 12:52 test1
1885112 -rw-rw-r--    2 wstearns wstearns        9 Mar  5 12:52 test2
[wstearns@sparrow linktest]$ myprogram test1

#Replace myprogram with the program under test.
#In this program, add some characters to the file and save your changes.

[wstearns@sparrow linktest]$ ls -ali test*
1885112 -rw-rw-r--    2 wstearns wstearns       19 Mar  5 12:54 test1
1885112 -rw-rw-r--    2 wstearns wstearns       19 Mar  5 12:54 test2

	The fact that the two files still share an inode and both
changed in content means that the link between test1 and test2 was
preserved.  If, instead, you get:

[wstearns@sparrow linktest]$ ls -ali test*
2236994 -rw-rw-r--    2 wstearns wstearns       19 Mar  5 12:54 test1
1885112 -rw-rw-r--    2 wstearns wstearns        9 Mar  5 12:52 test2

	, this means the program unlinked test1 before saving the
changes.
	Note that neither behavior is "correct"; it's just that you
may prefer one over the other while working on a given file.

Editor			Action on save	Notes
abiword-0.7.11		preserves link
bash-1.14.7's ">"	preserves link
bash-1.14.7's ">>"	preserves link
emacs-20.7		preserves link
gedit-0.9.2		preserves link
gnotepad+-1.3.1		preserves link	#When "write backup file" turned off
gnotepad+-1.3.1		unlinks		#When "write backup file" turned on
gnumeric-0.58		preserves link
gxedit-1.23		preserves link
jove-4.16.0.24		preserves link
kedit-1.1.2		preserves link	#When "Backup Copies" turned off
kedit-1.1.2		unlinks		#When "Backup Copies" turned on
lyx-0.12.0		preserves link
mcedit-4.5.51		preserves link	#~/.mc/ini: editor_option_save_mode=0 (Save mode=quick save)
mcedit-4.5.51		unlinks		#~/.mc/ini: editor_option_save_mode=1 (Save mode=safe save)
netscape-4.76		unlinks		#Editor in netscape-communicator
nedit-5.1.1		preserves link
patch-2.5.4		unlinks
rpm-4.0			unlinks		#on "-U" upgrade, at least.
rsync-2.3.2		unlinks		#on server, hardlink is unlinked when a new version sent
vim-5.1			preserves link
wordperfect-7.0		preserves link	#"Original document backup" has no effect; always preserves link.
xedit-3.3.2		preserves link


Contacts and credits.
	Please send comments, suggestions, bug reports, patches, and/or
additions to the filesystem or applications list to William Stearns
<wstearns@pobox.com> .
	Many thanks to Kevin Burton for his constructive suggestions, 
most of which made it into v0.3.0.  Sorry, Kevin, it's still written in
bash.  :-)



README-freedups-v0.5.pl

	freedups-v0.5.*.pl are a perl replacement for the bash freedups
(0.1-0.4) program.  By caching md5sums, working much more with inodes
instead of filenames, ignoring inodes which have no possibility of being
hardlinked, etc., they run much faster than the bash version, and get
faster still once they have a chance to cache some md5 checksums.

	I welcome feedback, and would especially like to know if you hit
any error cases that tell the program to abort.


freedups-0.6.14-0.noarch.rpm

Name        : freedups                     Relocations: (not relocatable)
Version     : 0.6.14                            Vendor: William Stearns <wstearns@pobox.com>
Release     : 0                             Build Date: Sun Mar 14 15:24:27 2004
Install Date: (not installed)               Build Host: sparrow
Group       : Applications/File             Source RPM: freedups-0.6.14-0.src.rpm
Size        : 56968                            License: GPL
Signature   : RSA/MD5, Sun Mar 14 15:24:28 2004, Key ID 012334cbf322929d
Packager    : William Stearns <wstearns@pobox.com>
URL         : http://www.stearns.org/freedups/
Summary     : Hardlinks identical files to save space.
Description :
Freedups hardlinks identical files to save space.  For files that are
generally read from and not written to, this can provide a
significant space savings with no performance degredation.  In fact,
in a small number of cases, this can speed up the system.

The files in this collection are part of William Stearns' software archive. If any of the links on this page do not work, you may be viewing an incomplete mirror. There is a complete list of the mirror sites at the starting page for this mirror and at the primary mirror.


Generated Sat May 13 02:51:45 EDT 2006 by htmlfilelist version 0.8.4