A long post summarising why I’ve fallen out of love with Arq backup, and some noodlings on the longevity of bits.
Being currently engaged in the digitisation of all my Mum and Dad’s precious photo albums, and relatively recently having become a father myself, I’ve been thinking about backups. A dull topic, admittedly, but my personal photos and videos aren’t like my music collection or, if I’m honest, my work files. They really matter, and they truly are irreplaceable.
When you’re trying to preserve something precious that you want to last longer than, er, yourself, things like these are your enemies:
- Not digitising things. For physical, atoms-and-molecules “things” like prints and negatives, degradation is inevitable; on top of that, there’s always the chance of loss or destruction.
- Keeping only digital copies of things. Obsolescence is inevitable, as is accidental deletion, or worse. If something’s really precious, keep both physical and digital copies.
- Keeping only one copy of things (well, dur!). Copying digital things is easier, cheaper and better than copying physical things.
- Bit rot. Summary: degradation is inevitable in the digital world too, because of physical storage, but some solutions and countermeasures exist.
- Lock-in to companies or products that promise to solve the problem completely for you. They’ll go away sooner or later. Companies will start work on new products and cease updating the one you’re using for backup. Perhaps they’ll be insufficiently motivated to help if you run into trouble: after all, your problem isn’t their problem. One day, programs will stop working when you upgrade your operating system or something.
- Complexity. If you died in an extreme ironing accident (why do people always choose being run over by a bus?), you’d want your nearest and dearest to be able to continue looking at the family photos.
- Inconvenience. If your backup process doesn’t happen automatically, it’ll stop happening eventually, through simple human nature.
There isn’t a solution that combats all of these in one go. An off-site backup is vulnerable to bit rot, even if it’s on a RAID system. Remembering to regularly copy your files somewhere once a month might solve the complexity problem, but it’s far from convenient.
There are some solutions that get close. By using Arq plus a local Time Machine backup, I get a sufficiently good backup for my work files, and my music collection. If you use a Mac, Arq is excellent for this stuff. I was using it to back up my photos, too, because it’s a solution that feels long term. (Arq uses your own account on Amazon S3 to store data, and has a published file format. The source code for arq_restore, a command-line utility that restores your files and folders, is available online. All of these make me feel confident, as if my backups are independent of the company and its motives or continued success.)
Recently, though, I lost faith in Arq for the things that matter the most to me. If you hit problems, or ask questions, their support - I’m sorry to say - can be terse. Haystack software shut the forums that cover Arq (although their website currently still advertises them as “new”). I started to get itchy feet. Finally, my enquiries about a Linux-based restore tool just felt like they went into a black hole. Although the file format is openly published, currently you can only restore your files from Arq onto a Mac or an iOS device.
For me, that represents a form of lock-in - although I admit the reason is slightly nuanced. The total amount of data I have backed up is about 2 weeks’ worth of upload from my house. (I think it’s more useful to think in those terms rather than gigabytes because I expect both my data’s total size and my internet connection to get faster over the years.) That makes it inconvenient to re-back-up everything I’ve got, at least using my own internet connection (the problem would be much smaller if I had a faster uplink to the internet). I can short-term rent a virtual machine in Amazon’s cloud for next to nothing, with a blazing fast internet connection… but not a Mac.
So, slightly reluctantly, I migrated my most precious backups away from Arq. I chose to hire a well-connected Mac for a few days at MacStadium, restored my photos and videos onto it, and pushed them back up to Amazon S3. There they now reside, securely but easily accessible from wherever I like. Other than my own hard-disc backups, I’m in Amazon’s hands - but I already was with Arq, and this is a second-level backup anyway.
Here are some things that make me glad that I did this:
- S3 tools are powerful (for instance, synchronising a local directory with an S3 directory is a one-liner), and amenable to automation, so backups are convenient and simple.
- Objects on S3 are immutable. I’m paying for someone else to worry about data degradation (although I still worry about it). Amazon don’t provide guarantees, but their other customers are much bigger fish than me, so I’m happy to be along for the ride.
- Since photos and videos are “accumulative” - by which I mean that once saved, they don’t tend to get edited, I can speed things up enormously by only considering very recent photos in my regular sync-ups. I just need to check once in a while that everything is still in both places. Compared to the Amazon end, I consider my own copies the more vulnerable. If my local copy of a really old photo got corrupted, under this scheme it wouldn’t be uploaded. If the copy on S3 gots corrupted, unencrypted photos and videos have the nice property that you can visually inspect them and determine which is the good copy. (In addition, I could easily flick a switch and have S3 store versioned copies of everything.)
- Having my backups in S3 means that when the ironing incident takes out my house as well as me, I’ve arranged it so my family are only one password away from accessing them. That wasn’t the case with Arq.
- I’m not locked in. Should some future storage solution come into existence (I’m thinking about Amazon Glacier, which came along after I started using Arq, but who knows what else lies ahead), my data are already “up there” - I don’t have to be put off by the prospect of another two-week upload.
- Once you get your stuff in the cloud all sorts of conveniences start to pop up. For instance, I like sharing videos with my family. Videos take an age for me to upload, but no longer do I have to upload once to YouTube and then again to my own backup store. Now, I can use my account at Webfaction to grab from the backup and upload to YouTube from their data centre, many times faster than I could from here.
Here are some links to reading I’ve done around this subject recently:
- Keeping Bits Safe
- Bitrot and atomic COWs: Inside “next-gen” filesystems
- Amazon S3 and Glacier: A Cheap Solution for Long Term Storage Needs
- Drive to Live?
Accompanying photo shows me immediately before a painful tobogganing incident, aged about five. The information represented by those bits is nearly 30 years old… more on the family digitisation project soon!