Backup and RestoreBackup and Restorein Cassandra andin Cassandra and
OpsCenterOpsCenter
OverviewOverviewSnapshot OperationsRestore OperationsCommit Log Archiving/Point in Time RestoreRemote backupFrom both Cassandra and Opscenter perspectives
SnapshotsSnapshotsNodetool Snapshot Basics Performs a flush, then hard links sstables to
More at
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSnapShot.html
org.apache.cassandra.db ->StorageService ->takeSnapshot
<data_file_directories>/<ks>/<table>/snapshots/<snapshot-name>/
Under the hood, mbeans
Snapshots in OpscenterSnapshots in Opscenter
Under Services -> BackupDisplays backup history, allows backup and restore.Advanced settings we'll cover laterBackup Service is an Enterprise Feature
More at
http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscBackupService.
html
Snapshots in OpscenterSnapshots in OpscenterSchedule repeated backupsor create ad hoc backupSelect keyspacesSet location (on server vss3)Uses the mbean to performthe snapshot rather thanshelling out.Coordinates the snapshoton all nodes.Backs up the schemato schema.jsonKeeps a log for audit
Auditable RecordsAuditable Records
Remote SnapshotsRemote SnapshotsOpscenter can alsobackup to s3Specify s3 bucket name,aws credentialsOptional transfer throttleand compressionNot all SSTables need tobe backed up, becausethey are immutable onlypart of the data mayrequire it.
SSTables need to be stored per node to avoid namecollisions.However dropping and recreating a table can lead toa naming collision as well, OPSC can attach atimestamp.If your data is encrypted, make sure that theencryption key is also put somewhere safe.Opsc backs up schemasTopologies change over time (more on this in restore).
Restore OperationsRestore OperationsSSTableloader Basics
Expects the schema to already exist for the sstables.Expects a directory structure different from thatcreated by the snapshot, specifically<Keyspace>/<Table>/<files>Can stream data to other nodes, doesn't just movefiles into placeLeaves files in place as they are restored, possibledisk penalty.
More at
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html
Restore OperationsRestore OperationsSelect a backup from alist of availablesnapshots.Point in Time restores(more on this later)Restore from otherlocation
Restore OperationsRestore OperationsAttempts to recreate theschema or do a schemacomparison. The latter isextremely difficult withthrift.Creates symbolic links in atemporary directory tomatch what SSTableloaderexpects.Logs/audit trail to follow.Uses SSTableloader
Remote RestoreRemote RestoreTopologies change over time.When topologies shrink multiple nodes worth of datawill have to be sent to a single node (sstable namingcollisions).
Remote RestoreRemote RestoreWhen topologies grow some nodes may be idleduring a restore.Replacement nodes will have a different host ID andwill need to be matched to host ID of the snapshot.Opscenter handles all of these cases.
Commit Log ArchivingCommit Log ArchivingCassandra an execute a scriptwhen writing commit logsegmentsset incommitlog_archiving.properties
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLogArchive_t.
html
Commit Log ArchivingCommit Log ArchivingOpscenter can enable that alsounder services->backupsservice->settingsOpscenter can also send theseto s3 as well.
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLogArchive_t.
html
Point in Time RestorePoint in Time Restore2 step operation, restore snapshot, then replaycommit logs.Find the nearest snapshot that happens prior to thepoint in time desired, perform a restore.Update commitlog_archiving.properties with thelocation of the commit logs as well as the point intime to restore.Restart cassandra.
More At
http://docs.datastax.com/en//cassandra/2.0/cassandra/configuration/configLogArchive_t.
html
PiT in OpscenterPiT in OpscenterOpsCenter canautomate the PiTrestore processSet time (in UTC)OpsCenter will verifythat it is capable ofrestoring to that pointin time.Commit logs orSnapshots can be localor on S3
PiT Restore ChallengesPiT Restore ChallengesCommit log replays don't stream data around thering, this makes topology changes difficult to handle.Comparing schemas can be tricky if the reply containsschema changes.
Questions?Questions?
Feel free to reach out:https://www.linkedin.com/in/philipsdoctor