This is mainly a cheat sheet for me to remember. Nothing rocket science.

It often makes sense to use netcat/nc to copy MySQL database between hosts in trusted networks. It bypasses encryption overhead of SSH and depending on configuration can be significantly faster.

Also note MySQL should be down when you copy data unless you’re copying from snapshot etc.

So to copy go to the mysql data directory on both boxes; such as cd /var/lib/mysql . Make sure target directory is empty. Now on the TARGET server do nc -l 4000 | tar xvf – and on the SOURCE server do tar -cf – . | nc target_ip 4000

Also note – the port you’re using should be open in the firewall.

26 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Antony Curtis

Warning: TCP may introduce errors in your data due to inadequate CRC in packet header. Make sure to to a MD5 or SHA1 of the files at both ends before trusting that the data was transferred correctly.

Arjen Lentz

@Antony I was just reviewing that to make sure, but I doesn’t tar have crc so with the cmdlines that Peter posted I reckon it’d be ok.
In many cases though the firewall hassles are greater than the ssh overhead 😉

Andrew

Hi,

I think you forgot a “-p” from the netstat command on the target server. Should probably read:

nc -l 4000 | tar xvf –

Andrew

I meant:
nc -l -p 4000 | tar xvf –

Przemek

You need to use ‘-p’ only on debian like distros, while on redhat’s not. nc seems to be slightly different across those families.

Sometimes when you need to limit transfer bandwidth throttle (http://klicman.org/throttle/) is extremely handful. With this little nice tool you can adjust transfer speed without stopping it !
Also bar (http://clpbar.sourceforge.net/) is very nice 🙂

Ken

Why not rsync? You get wirespeed performance, incremental updates (e.g, interruptions) and optional checksumming. You can also set up a standard rsync config on your database machines, and enable “rsync –daemon” at will and disable it after.

Of course, if you have secure data that’s not encrypted in-DB, then scp is a better bet, or even pre- and post-encryption if ssh is to span high latency links.

$0.02

kb

Przemek

Ken, sure, rsync is great, but if you have like 400GB of files and just want them copied ASAP via local network tar|nc way is faster and uses much less cpu. All because there is no ssl overhead.

Besides if you need to copy db snapshot from production server and even if you limit transfer using rsync’s bwlimit option you can’t change the speed in the middle of transfer in case of the machine’s disk I/O is hot.

You can limit disk transfer at any moment during copy process using simple throttle tool.
Just put it between pipes like this (I suggest running it in screen)
tar cf – . |throttle -l /tmp/throttle.ctl -m 50|bar|nc $destination_host $tcp_port
and to limit the speed to for example to 20Mbit:
throttle -t /tmp/throttle.ctl -m 20

Ken

Couple of things.

Przemek: rsync does not use SSL. Its data only passes through SSL if you use it in SSH transport mode. Daemon mode is much faster, and will trivially saturate 2xGigE. Rsync allows you to interrupt an rsync and continue where you left off, even for large files. Changing bwlimit is easy in this way.

Peter: rsync does not require root. But it does require you set up a trivial config file, something like:

% cat c
[db]
path = /path/to/db
use chroot = false
readonly = false
% rsync –daemon –port=1873 –config=c
remote % rsync -aHPv –port=1873 sourcedata remotehost::db

(optionally -W if your destination write capacity is significantly lower than network bandwidth)

Especially in a consulting situation, I’d rather have a four line config file in my toolbelt for a very standard command, instead of relying on A) having root access to install nc or throttle if necessary, which are far less standard, or B) having a stash of nc/throttle binaries for every possible architecture (which even these days is still pretty substantial). And the incremental nature of rsync has other flexibilities; for example, rsyncing a database while it’s running, shutting it down, and rsyncing only changed content (useful mainly in niche, low bandwidth situations).

Again, just my two cents; I think nc is the best tool for many applications, but for file-based objects rsync is difficult to beat, IMHO.

Ken.

“Also note – the port you’re using should be open in the firewall.”

Are you really serious? I wonder who are your intended readers that you’re mentioning such an obvious thing?

Joel K.

My Ubuntu 8.04 system didn’t know about bar.

I’ve always used ‘pv’ for essentially the same function.

pv – monitor the progress of data through a pipe

Which appears quite similar.

MattW

in the utility copy vein, if you have a source host, a target host on rfc1918 address space, and an intermediate bastion host you have to ssh to in order to REACH the target host (since it doesn’t have an internet-legal ip) and you don’t have a vpn/etc, you can pipe tar through a chained ssh via:

tar cvf – mysql_dir | ssh -A [email protected] “ssh -A [email protected] ‘cd /path/to/target/;tar xvf -‘”

Richard Thomas

Rsync is great for live replication copies also

1. rsync with database running, depending on last rsync can take a while
2. Flush tables/cache, rsync again, won’t take nearly as long
3. shutdown master
4. rysnc one final time, on a 10gig db only takes a couple seconds
4. restart master, downtime only a few seconds and you have a fresh copy to use for starting a slave back up

Of course you need to delete the proper log files depending on your setup and how you store your log files.

Przemek

Instead of shutdowning master just use LVM snapshot method to create a slave or backup. This way you only need a few secs of read lock.
If you don’t use LVM partitions I suggest to start using them (optimal separate ones for data and binlogs).

Richard Thomas

As its even been reported on this blog, the LVM snapshot penalty can be extreme, on a overly busy server (and yes I understand thats a problem of its own) starting up an LVM snapshot can cause a cascading failure of the system as it struggles to keep up.

Przemek

1. You do it at off peak time.
2. IMHO if your system dies only because of LVM snapshot overhead – you’re far too close to your machine’s disk I/O limits.
3. IMO always the best solution is to have a spare slave used only for backups, some periodic heavy queries, etc.

Ken

RT: agreed.

Przemek:

1. You’ll find more and more that these days people using MySQL run HA systems (as HA as MySQL gets anyway) for international users and systems, not just a little PHP site in Dubuque. The “night-time trough” is no longer a luxury for a lot of the client base of MySQL.

2. LVM snapshot overhead alone can be 20-50%; startups with limited capex don’t want to pay 20-50% more for their storage hardware to support a method of database backup used solely because a working solution isn’t available from the DB or engine vendor (explain /that/ to a VC). Established companies don’t want to budget in 20-50% more for their expensive SAN setups. We’d all love a few thousand spare TPS or a few 100MB/s of spare throughput, but that money should be going to what makes money, not what loses money. This also doesn’t cover the overhead of saturating the disk subsystem with the *actual copy* with tar/nc or rsync. Been there. Done both. Bought the T-shirt.

3. Absolutely agree; the many other benefits of a slave makes the cost worthwhile.

LVM is a very useful tool, but the fact that it is often represented as a panacea by the MySQL community (not singling you out) does a disservice to those relying on that information (who then have it blow up in their face). IMHO; apologies for the rant.

Ken.

Kevin Burton

I agree with #1 …. do NOT do this on production data without checksums.

You WILL get data corruption.

Morgan Christiansson

It is possible to use a less expensive cipher with ssh which will speed things up significantly.

For example:
rsync -e ‘ssh -c blowfish’ …
scp -c blowfish …

James Byers

If using Ken’s rsync example and you’re root on both sides, you’ll likely need:

uid = root
gid = root

above

[db]

in the rsync config file to avoid permission problems.

eaglesflyhigh

Good day, i need a good ip changer and how do i get more bandwidth from my server [email protected]

eaglesflyhigh

Brains and Genius,respect to you all….i read almost everything on this site but i still need someone to put me through… im of good intention to life everything i learn, im a fast learner and i’d like to meet someone who ‘d be so kind and humble to please be my teacher/master.. to teach me some few thing which i could develop later on my own we could use yahoo or msn messenger for the training session…[email protected]
looking forward to meet you. thanks
simplee me

LoadAverage

You can also copy the data directory to multiple servers at once using tee:

Each Receiver: same as in the article: nc -l 4000 | tar xvf – [or nc -l -p 4000 | tar xvf – ]
Sender: tar -f – -c . | tee >(nc target_host_1 4000) | nc target_host_2 4000

You can add as many additional hosts as you want.