Cheat Sheet - rsync

As the man page says, the default behaviour of rsync is to create a new copy of the file in the destination and to move it into the right place when the transfer is completed. If the destination directory doesn’t exist rsync will create it.

A trailing slash on the source avoids creating an additional directory level at the destination.

rsync -avz src/ dest   # content of ./src/ transferred to ./dest/
rsync -avz src  dest   # content of ./src/ transferred to ./dest/src/

Resume interrupted sync (handle connection loss)

Some people have mentioned using the --partial flag works, it needs to be mentioned however that it only resumes when the --append or --append-verify flag is used when resuming.

--partial creates a hidden file of the file that has not finished the sync process, the file is kept when you interrupt syncing. It continues to complete the file when you use --append after resuming, otherwise when not using --append, the incomplete hidden file is kept and remains incomplete.

Conclusion: You can just interrupt rsync --partial using Ctrl + C if you use rsync --append when resuming

Rsync examples

Misc.

- Lists files without copying them
    rsync --dry-run src dest

- Show progress per file:
    rsync --progress src dest

- Show global progress:
    rsync --info=progress2 src dest

Remote transfers

Settings in $HOME/.ssh/config are also respected by rsync making commonly accessed systems far easier to use.

rsync -avz -e "ssh -l <user>" <src> <user>@<server>:<target>

Non standard SSH port

rsync -e "ssh -p 2222" ...

Use specific ssh key

rsync -e "ssh -i $HOME/.ssh/id_rsa_for_rsync" ...

Tunnel through a jump host with key agent forwarding:

rsync -e "ssh -A -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ProxyCommand=\"ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p ${SSH_USER}@${BASTION_HOST}\"" ./deployment/ ${SSH_USER}@${TARGET_HOST}:/var/www/${ENV}/

Run with elevated permissions:

rsync --rsync-path 'sudo rsync' ...

Copy remote->local, keep attributes, use compression, be verbose and show human readable units:

rsync -avzh -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null" \
    --progress 'preview-host:/var/www/html/mb/app/*' /tmp/

Copy local->remote, keep attributes, be verbose, use root permissions on the target, delete files that are not in the source path and use checksums to compare files:

rsync --rsync-path 'sudo rsync' -avP \
    -og --chown=root:root \
    --checksum --delete \
    ./deployment/ ubuntu@${TARGET_HOST}:/opt/app/${ENV}/

Filtering

  • --exclude=important_file.txt - Can be used to omit files or directories from being synced.
  • --exclude=backups/ --include=backups/most_recent - Inside the exclusion we can explicitly include certain file, folders or patterns that fall inside the broader exclude.

In the following example, we are excluding the node_modules and tmp directories which are located inside the src_directory:

rsync -a --exclude=node_modules --exclude=.DS_Store --exclude=tmp /src_directory/ /dst_directory/

The second option is to use the --exclude-from argument and specify the files and directories you want to exclude in a file.

rsync -a --exclude-from='/exclude-file.txt' /src_directory/ /dst_directory/

/exclude-file.txt

file1.txt
.DS_Store
node_modules
tmp

Exclude filters can be written in a condensed form:

# Before: 
rsync -a --exclude 'file1.txt' --exclude 'dir1/*' --exclude 'dir2' src_directory/ dst_directory/
# After:
rsync -a --exclude={'file1.txt','dir1/*','dir2'} src_directory/ dst_directory/

Pattern matching

'*.jpg*'

It is little trickier to exclude all other files and directories except those that match a certain pattern. Let’s say you want to exclude all other files and directories except the files ending with .jpg.

One option is to use the following command:

rsync -a -m \
    --include='*.jpg' \
    --include='*/' \
    --exclude='*' \
    src_directory/ dst_directory/

When using multiple include/exclude option, the first matching rule applies.

  • --include='*.jpg' - First we are including all .jpg files.
  • --include='*/' - Then we are including all directories inside the in src_directory directory. Without this rsync will only copy *.jpg files in the top level directory.
  • -m - Removes empty directories.

Cleanup

# Automatically delete source files after successful transfer
rsync --remove-source-files -zvh backup.tar /tmp/backups/

Since --remove-source-files does not remove directories, issue the following commands to move files over ssh:

rsync -avh --progress --remove-source-files /source/* user@server:/target \
&&  find /source -type d -empty -delete

Logging

rsync ... > /tmp/rsyncbackup.log 2> /tmp/rsyncbackup.errors.log

Useful parameters

-o, --owner                 preserve owner (super-user only)
-g, --group                 preserve group
    --devices               preserve device files (super-user only)
    --specials              preserve special files
-D                          same as --devices --specials
-t, --times                 preserve modification times
-p, --perms                 preserve file/directory permissions
-l, --links                 copy symlinks as symlinks
-u, --update                skip files that are newer on the receiver
-C, --cvs-exclude           auto-ignore files in the same way CVS does
--progress                  show progress during transfer
--stats                     give some file-transfer stats
--list-only             list the files instead of copying them
--bwlimit=KBPS          limit I/O bandwidth; KBytes per second
  • -a, --archive - Archive mode, equivalent to -rlptgoD. This option tells rsync to syncs directories recursively, transfer special and block devices, preserve symbolic links, modification times, group, ownership, and permissions. Sometimes you will have to supplement the -a parameter with:
    • -X - Preserve extended attributes, e.g. SELinux contexts may be stored as such attributes on distributions like CentOS/RedHat where these are used by default.
    • -A - Preserve ACLs (Access Control Lists)
  • -z, --compress - This option will force rsync to compresses the data as it is sent to the destination machine. Use this option only if the connection to the remote machine is slow.
  • -P - equivalent to --partial --progress. When this option is used rsync will show a progress bar during transfer and to keep the partially transferred files. It is useful when transferring large files over slow or unstable network connections. Without -P or --partial, if the connection drops during a transfer, the file is deleted and you will have to restart from scratch.
  • --delete - Delete files in the destination that don’t exist anymore in the source location. Used when you want to keep an exact replica of the source files/directories. Without this option, files that have been deleted in the source won’t be deleted on destination, which is preferable for most backup schemas. Keep in mind that the --delete parameter exposes you to the risk of losing the entire backup, if used inappropriately (e.g., if you use the wrong source directory or an empty one). An option like --max-delete=3 so that rsync never deletes more than 3 files can reduce the amount of data you might lose. The number can be adjusted according to your use case.
  • -q, --quiet - Use this option if you want to suppress non-error messages.
  • -e - This option allows you to choose a different remote shell. By default, rsync is configured to use ssh.
  • -v - Verbose mode prints more statistics: what files are currently copied/transferred and summary about bytes transferred and speedup ratio.
  • -r - Copy every object contained in directories and subdirectories. Without this option, directories are skipped and only files are copied. E.g., rsync -v root@example.com:/etc/* /tmp would only copy files from /etc/. When you are copying/transferring a single directory, you have to use this option or the -a parameter, otherwise nothing happens, the directory is simply skipped.
  • -h - Show “human readable” numbers: instead of statistics being shown in bytes, they will be displayed in megabytes, kilobytes, etc., because 9.82M is easier to read than 9,821,016.

Edge Cases

Thin Provisioning

When files are getting significantly bigger on the other side Thin Provisioning (TP) is probably enabled on the source system - a method of optimizing the efficiency of available space in Storage Area Networks (SAN) or Network Attached Storages (NAS).

E.g.: The source file was only 10GB because of TP being enabled, and when transferred over using rsync without any additional configuration, the target destination was receiving the full 100GB of size. rsync could not do the magic automatically, it had to be configured.

The flag that does this work is -S or -sparse and it tells rsync to handle sparse files efficiently. And it will do what it says! It will only send the sparse data so source and destination will have a 10GB file.