emailbook implemented as Shell script, in Janet and in Hare

2024-04-20

What is emailbook?

emailbook is a little tool to manage your minimalistic address book. An emailbook address book is nothing but a file containing mailboxes (= email address + optionally a display name) and optionally an alias. It's designed to be used with the awesome TUI email client aerc. But it might work with other email clients as well. emailbook can both be used to parse mailboxes from an email (opened in aerc) and to provide a list of matching contacts for autocompletion when typing a recipient (in aerc). The alias is your personal abbreviation, not visible to anybody.

When I searched for an address book/email address autocompletion for aerc, I found the great little tool aercbook. It works well and is fast (being written in Zig) but it had some little bugs. As workaround for one of these issues, I wrote a little wrapper shell script. Then I thought that everything this tool does could be done with a few sed, grep and awk calls. So I rewrote aercbook as a POSIX shell script with slightly different behaviour and added support for encoded strings (MIME encoded-words) and called it emailbook.

For not having to create my initial address book by opening every single mail I have ever received (or sent) in aerc, I wrote a little script that read all the email addresses from my (old) local emails, stored in the maildir format: One email = one file. Calling emailbook once for every email is slow. So I added a mode that started emailbook only once and read a list of email filenames via stdin. This was much faster but still too slow, compared to aercbook (scroll down for the numbers).

Faster implementation in Janet

Some time ago, I have learned about an awesome tiny programming language called Janet. It's really small and easy to embed in C programs, like Lua, but it looks like Lisp. And it's good for standalone scripts, too.

One strengths of Janet is its builtin support for Parsing Expression Grammars (PEG). This is like Regular Expressions but better. I had planned to use Janet for a project anyway, and I was curious how it's like to use PEGs instead of regex and how much faster emailbook could be when (re)written in Janet.

This was the short story behind emailbook-janet.

Get even faster with Hare

My other new favorite programming language next to Janet is Hare. Hare, being a systems programming language, is much like C: small and simple. And as far as I can tell, all differences from C are to the better. Again, I wanted to see how hard or easy it is to rewrite emailbook in Hare and how much faster it would run.

So I wrote emailbook-hare.

Performance comparison

I have tested for all implementations how long it takes to parse all email addresses of the first 1000 emails in my inbox. I'm including two ways of testing:

Loop over 1000 emails and start emailbook/aercbook once for each.
Give emailbook a list of 1000 email filenames and process all in one run.

	aercbook	emailbook-hare	emailbook-janet-bin	emailbook-janet	emailbook
1000 instances	0.9	1.2	4.3	12.5	66.1
1 instance	-	0.13	0.93	0.95	10.3

Durations in seconds.

Some remarks:

emailbook-janet-bin is the Janet script compiled to a binary which contains the interpreter. Thus, it's compiled but still interpreted :-)
The comparison with aercbook lags because emailbook does more than the former, p.e. decoding encoded strings like =?ISO-8859-15?Q?Garc=EDa?=.
Hare is a work in progress and its regular expressions are still slow. This is why I have replaced most regexes with custom functions that check the strings byte by byte. Of course, I could try the same in Janet. But on the other hand, Janet doesn't use regex either. No idea how similar the code must be for a fair comparison.

Conclusion

It was fun coding the same tool in three languages. I learned much especially about Janet and Hare. Both languages' communities are great and they helped me to get the implementations faster.

I guess the results are not too surprising. I had no doubts that Janet was faster than the shell script and Hare faster than Janet. But for me it was interesting to see the actual numbers. And I'm sure, there is still enough room for improvement in all implementations.

If you like, have a look at all emailbook versions and compare which code is the easiest to understand. If you find a way to get it faster (without making it harder to understand), send me a patch to https://lists.sr.ht/~maxgyver83/emailbook.