Published 2022-12-01.
Last modified 2026-03-15.
Time to read: 1 minutes.
I've been reorganizing directories across several volumes on an Ubuntu server.
This article documents how I used the commands I found to be most useful:
rsync,
ncdu,
and fdupes.
The latter two programs are not installed by default.
You can install them this way:
$ yes | sudo apt install fdupes ncdu
Display Directory Sizes
$ ncdu /directory
Find Duplicate Files
rdfind
rdfind
supports hard links, which remove the stress of accidently deleting the wrong duplicate.
Hardlinks are only possible within a single volume; hardlinks cannot span two volumes.
$ sudo apt install rdfind
This is the help message:
Usage: rdfind [options] FILE ...
Finds duplicate files recursively in the given FILEs (directories),
and takes appropriate action (by default, nothing).
Directories listed first are ranked higher, meaning that if a
file is found on several places, the file found in the directory first
encountered on the command line is kept, and the others are considered duplicate.
options are (default choice within parentheses)
-ignoreempty (true)| false ignore empty files (true implies -minsize 1,
false implies -minsize 0)
-minsize N (N=1) ignores files with size less than N bytes
-maxsize N (N=0) ignores files with size N bytes and larger (use 0 to disable this check).
-followsymlinks true |(false) follow symlinks
-removeidentinode (true)| false ignore files with nonunique device and inode
-checksum md5 |(sha1)| sha256 | sha512
checksum type
-deterministic (true)| false makes results independent of order
from listing the filesystem
-makesymlinks true |(false) replace duplicate files with symbolic links
-makehardlinks true |(false) replace duplicate files with hard links
-makeresultsfile (true)| false makes a results file
-outputname name sets the results file name to "name" (default results.txt)
-deleteduplicates true |(false) delete duplicate files
-sleep Xms sleep for X milliseconds between file reads.
Default is 0. Only a few values
are supported; 0,1-5,10,25,50,100
-dryrun|-n true |(false) print to stdout instead of changing anything
-h|-help|--help show this help and exit
-v|--version display version number and exit
If properly installed, a man page should be available as man rdfind.
rdfind is written by Paul Dreik 2006 onwards. License: GPL v2 or later (at your option).
version is 1.6.0
Sample usage:
$ rdfind -makehardlinks true /data
Avoid Running rdfind on ~/go or /,
which would cause rdfind to try to modify dependencies or read-only files
rdfind does not have an exclude flag in all versions, you can
point it to specific subdirectories instead of the root /data
fdupes
The user interface for fdupes is frustrating.
This displays duplicate files and interactively asks which duplicate files to delete.
$ fdupes -r1Sd /directory
Compare Two Folders For Missing Files
This tip was inspired by an answer on
unix.stackexchange.com.
Dry Run
The -n option displays the names of missing files in the destination directory,
but makes no changes to the destination file system.
If that option is not specified, then the missing files are copied.
$ sync -ri --ignore-existing -n /srcDir/ /destDir
There are a few things to note:
- The first directory is the source, and it must end with a slash (/).
- The second directory is the target, and it must NOT end with a slash.
Copy Missing Files
Simp ly do not provide the -n (dry run) option:
$ rsync -ri --ignore-existing /srcDir/ /destDir
Sync Directories
rsync is better than cp because it can be restarted.
The following removes files in the dest directory that are not present in the src directory.
$ rsync -ahnvx -delete-before src/ dest
$ rsync -ahvx -delete-before src/ dest