Magically I've recently "discovered" the Percona Archiver - I've been rolling my own for far too long. This tool is well documented and I shan't repeat the documentation other than to give an example along with some tidbits.
The archiver can move records to a destination table (the --dest option) OR to a file (the --file option). Both are useful and I'll show the file one because that's The Final Solution other than outright launching the nukes with --purge. Give a Select criteria (the --where option) and consider to include table maintenance (--optimize) if you are moving a lot of data.
For clarity: pt-archiver does a DELETE for each record it archives.
# dump table from N months ago
This is a drastically simplified script from what I used to do.DELAGE=6DELYRMO=`date --date "$DELAGE months ago" +%Y%m`TABLE="calib_aimextractor_log_history_$DELYRMO"BAKFILE="/mnt/backups/mysqlserver/archive-$DB-$TABLE.txt"# do not (!) overwrite file with something (-s) in it alreadyif [ ! -s "$BAKFILE" ] ; thenpt-archiver --source h=localhost,D=$DB,t=$table --file archive-$table.txt --where "calib_aimextractor_id > 0" --optimize s --statisticselseecho "$BAKFILE has something in it, dump has been SKIPPED"exit -1fi
- Set the data retention which in this case is 6 months. The "date" command is useful for generating dates or parts thereof like the year, month, day, week whatever you need for both file names and search criteria.
- File target should be some file system location locally or NFS. The file format is suitable for LOAD DATA INFILE.
- Gotcha! Loading data files is a risky thing to do and disabled by default in MySQL. Typically load the data to a non-production server, then manually extract the relevant records and insert them back into prod.
- Sanity check you're not stomping a file that's already there. I prefer to be safer than sorrier.
- Credentials should be in .my.cnf
- Seems obvious when you know to do it, but don't put user creds in scripts, dumbo! I did that too often :(
- Gotcha! If using --dest table instead of a file target, specify the host (h) and database (D) because otherwise pt-archiver makes some assumptions which may be very wrong
- Optimize your source (s) especially if a large number of rows are being pulled. Consider to also use destination (d)
There's lots more guidance in the documentation and from other users Online. Some like to process larger numbers of records concurrently like with --limit and --bulk-delete, but the defaults (1 record) have been good to me as this runs relatively fast. Likewise there's options to check your slaves don't get far out of sync which again default behaviour is fast enough for me, but there's lots of powerful options to tune pt-archiver.
Take backups, test, test, test and you shouldn't need Good Luck :)
No comments:
Post a Comment