Q:
Rsync appears hung -- what should I do?
A:
When experiencing a hang or freeze please gather the following
information before killing the rsync process:
- The state of the send/receive queues shown with netstat on the two ends.
- The system call that each of the 3 processes is stuck in (use truss on
solaris, strace on Linux, etc.).
Try telling rsync on both sides of the connection to send messages to
stderr, which might make the failure message visible. i.e., use:
--msgs2stderr -M--msgs2stderr
That alone might get rsync to stop hanging. Also, if you're using more than
one --verbose
(-v
) option then I have 2 simple words
for you: stop it. If you need more info on what rsync is changing, using the
--itemize-changes
option (-i
) and repeat it if you
need to see unchanged files. This is a much better way to go that doesn't fill
up the communication pipeline with a large quanity of debug messages.
See the "rsync-debug" script below for an example of how to grab strace
information from the remote rsync process(es). If you need help, send email to
the mailing list.
Q:
Why does my transfer die with something like the following error?
rsync: error writing 4 unbuffered bytes - exiting: Broken pipe
rsync error: error in rsync protocol data stream (code 12) at io.c(463)
or
rsync: connection unexpectedly closed (24 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(342)
A:
This error tells you that the local rsync was trying to talk to the remote
rsync, but the connection to that rsync is now gone. The thing you must
figure out is why, and that can involve some investigative work.
It is a good idea use the --msgs2stderr
options mentioned at
the top of this page to get rsync to output any errors it encounters to stderr
instead of trying to write them down the failing pipeline.
If the connection is via ssh (or other remote-shell command) then you should
run some tests to make sure that you can actually run the remote rsync and that
your shell isn't injecting extraneous output into the rsync stream. For instance,
try running these two commands using whatever HOST (and user) options you need:
echo hi | ssh HOST cat
ssh HOST rsync --version
The first command should output just the string "hi" and nothing else. The
second command should successfully start the remote rsync and report its version.
If the remote rsync is a daemon, your first step should be to look at the
daemon's log file to see if it logged an error explaining why it aborted the
transfer. Also double-check to ensure that the log file is setup right, as a
wrong "log file" setting in your rsyncd.conf file can also cause this problem.
You could also halt the daemon and run it interactively using the
--no-detach
and --msgs2stderr
options and look for
errors while someone tries the rsync copy in another window.
As for the cause of the remote rsync going away, there are several
common issues that people run into:
- The destination disk is full (remember that you need at least the
size of the largest file that needs to be updated available in free
disk space for the transfer to succeed).
- An idle connection caused a router or remote-shell server to close
the connection.
- A network error caused the connection to be dropped.
- The remote rsync executable wasn't found.
- Your remote-shell setup isn't working right or isn't "clean"
(i.e. it is sending spurious text to rsync).
If you think the problem might be an idle connection getting closed, you
might be able to work around the problem by using a --timeout
option (newer rsyncs send keep-alive messages during lulls). You can also
configure ssh to send keep-alive messages when using Protocol 2 (look for
KeepAlive, ServerAliveInterval, ClientAliveInterval, ServerAliveCountMax, and
ClientAliveCountMax). You can also avoid some lulls by switching from
--delete
(aka --delete-before
) to --del
(aka --delete-during
).
If you can't figure out why the failure happened, there are steps
you can take to debug the situation. One way is to create a shell
script on the remote system such as
this one named "rsync-debug".
You would use the script like this:
rsync -av --rsync-path=/some/path/rsync-debug HOST:SOURCE DEST
rsync -av --rsync-path=/some/path/rsync-debug SOURCE HOST:DEST
This script enables core dumps and also logs all the OS system calls
that lead up to the failure to a file in the /tmp dir. You can use the
resulting files to help figure out why the remote rsync failed.
If you are rsyncing directly to an rsync daemon (without using a
remote-shell transport), the above script won't have
any effect. Instead, halt the current daemon and run a debug version
with core-dumps enabled and (if desired) using a
system-call tracing utility such as strace, truss, or
tusc. For strace, you would do it like this (the -f option
tells strace to follow the child processes too):
ulimit -c unlimited
strace -f -t -s 1024 -o /tmp/rsync-$$.out rsync --daemon --no-detach
Then, use a separate window to actually run the failing transfer, after
which you can kill the debug rsync daemon (pressing Ctrl-C should do it).
If you are using rsync under inetd, I'd suggest temporarily disabling
that and using the above daemon approach to debug what is going on.