Just checking in! Hello!
I have been doing a lot of work on this, and I have to say it is going very well.
I expect it will still be another month or two before I have something to show people, but I am laying some very sold foundations and porting code over as I go.
There are some fundamental philosophical/design changes I can tell you all about though, in the meantime!
Mailpile 1 preloaded a lot of data into RAM to accelerate search results. This worked as intended - my Mailpile (v1) now has over 1 million messages on file, but results are still fast. This had serious downsides though. In particular, the startup time for the app goes up with time, increasing linearly as more mail is processed. It now takes me around 10 minutes to restart my Mailpile, which is frankly unacceptable. The other downside is RAM usage; most of those million messages I really don’t care about and Mailpile uses gigabytes of RAM to allow fast access to searches involving them. This isn’t the worst tradeoff in the world, but it’s also not the best.
I am taking a different approach to the metadata index in Moggie. Moggie no longer loads the data directly into RAM on startup, instead it just loads some compact indexes and mmap()s the data files. Startup becomes almost instantaneous, and the plan is to rely on the operating system kernel to cache frequently used metadata in RAM, instead of doing so ourselves. The data structure itself is designed to facilitate this: recently received mail and old mail will (over time) occupy different files and different regions of disk, allowing the kernel to more easily cache the data we are most likely to care about and ignore the rest. We waste some disk space to facilitate fast in-place edits/updates, but make up for it by using a tighter encoding scheme than Mailpile did.
Overall this will make Moggie’s search performance a little more erratic (especially if people are using spinning-rust storage), but we should make up for it by not wasting CPU and RAM on data we don’t intend to use. The code itself is also smaller, Moggie just does less work in Python space, offloading more to the OS kernel. Which is also good for performance.
I’ve been having a lot of fun working on this.
I was testing the code last night, and combining the new metadata index with some carefully optimized mbox loading code I also wrote recently, Moggie can build a fresh new metadata index from scratch many times faster than Mailpile 1 could merely load its pre-calculated index from disk.
All of which is to say, progress is being made, and I am becoming more confident that my decision to start from scratch and use Mailpile 1 as a source of code snippets I can use or discard as I like, is feeling very good right now.