Backing Up Email: IMAP

Overview
I use Tuffmail IMAP service for my personal email. I have three mailboxes: one for me, one just for a high-traffic mailing list I'm on, and one for Johanne.

I have dozens of email addresses mapped to our mailboxes, and dozens of folders in my mailbox (including a number of folders for mailing lists -- the one with its own mailbox is special).

Because Tuffmail is not Gmail, there is a limit on available disk space. We share a 2Gb quota for all three mailboxes.

I place great faith in Tuffmail's service, so I haven't felt the need to download backups of my email. If it were easier, I probably would anyway. It's probably lax of me not to do so; maybe I should start. Anyway, I don't currently do backups for disaster recovery -- I let Tuffmail handle that.

However, the limited quota does mean that we bump up against the limit. When that happens, I sort through my accumulated years of mail to see if there are things to download and archive offline.

Tools I Use

 * IMAPSize
 * ImportExportTools extension for Thunderbird
 * imapsync larch
 * Dovecot
 * raw IMAP -- see Accessing an IMAP email account using telnet for a quick howto

ImapSize is really handy to look for the biggest things to archive.

ImapSize also has decent download and backup capabilities, and I had been using it for archiving some IMAP folders. However, since it's Windows-only, it requires either a virtual Windows machine, or booting into Windows.

ImportExportTools looks really handy, and I just started using it. Since it runs in Thunderbird, it runs nicely on Linux. If IMAPSize is running, it's still handy to use that for checking sizes and emptying (delete+expunge) mailboxes.

N.B.: IMAP is slow. It takes a long time to export thousands of messages.

Archiving Methodology
Currently, I do incremental, additive IMAP backups to a snapshot of an Amazon EC2 volume.

To do so, I create a volume from the most recent snapshot, instantiate an Amazon EC2 server, attach the volume, install larch and Dovecot, and then use larch to update my backup. I take a snapshot of the updated volume, then release the volume and the EC2 instance. I have a copy of the dovecot.conf file and a simple documentation file stored on the snapshots to make it easier to set up the clean instance.

Older notes:


 * format: generally mbox, but sometimes eml. When I have a good content-addressed personal archiving system, eml will be better, because it will be easier to avoid having duplicate copies of any emails.  But for now, it's more convenient to move and store one mbox file around instead of many eml files.
 * storage: some on an EBS volume snapshot, most on multiple hard drives at home

When archiving a batch of emails, I try to include in the directory path or file name:
 * server: tuffmail
 * mailbox: e.g., kaminski@istori.com
 * folder names, of course: e.g., Lists/linux-elitists, Shopping/Amazon
 * archive date: e.g., 20091013

Someday, I'll have tools to untangle all the mbox files, all the separate eml files, and with the metadata clues, (re-)construct one big coherent hierarchical set of folders with all of my historical emails.

Other Tools

 * fetchmail
 * formail, from the procmail suite
 * SmartSave extension for Thunderbird
 * a roundup of tools: http://kb.mozillazine.org/Archiving_your_e-mail