emlx Files

For the moment, I’m using Apple‘s Mail as my primary email client (even though it’s bafflingly slow at displaying messages at times (they’re simple text files!), gets stuck updating at times, and won’t let me tell it that ihug‘s SSL certificate is ok (which is partly ihug’s fault for buying some cheap one instead of something that programs would recognise) it does have some nice features, and beats any of the other mail clients I’ve tried).

As of Tiger, Mail stores messages in individual emlx files, scattered through various folders in the ~/Library/Mail folder.  For use with SpamBayes‘ test setup (as well as others, like the TREC one), I need messages in individual files in plain RFC2822 format.

What I needed was a simple export script (much like the existing Outlook export script – except hopefully faster and including attachments) that would create RFC2822 copies of the emlx files in the standard SpamBayes format (ham and spam directories containing a reservoir directory containing messages as individual text files).

I had thought that this might be quite difficult (take a look at the Outlook export script!) since emlx is a proprietory format.  Thankfully, I discovered that the first line is the size of the message in bytes (as text), followed by the RFC2822 message itself, followed by a plist containing various Mail information I’m not interested in (flags, sender, etc).  Nice to see that Apple can keep things simple.

So the SpamBayes distribution now contains a simple export_apple_mail.py script that will do the job.

technorati tags: , , , ,

2nd November 2005 Python-Dev Summary

The second November Python-Dev summary is now out. This took a while (even though the summary was actually finished a while back), but is the first summary that I’ve actually been able to publish on the pydotorg website myself, without Brett’s help. This should mean that future summaries are quicker to get out.

(Steve and I are working on the December summaries now – end-of-year tasks delayed the first December summary).

technorati tags: , , ,