Email Analytics


During the past six months I've been drowning in email. I spend a large part of my day responding to email messages and filing incoming messages I consider important. Yet I'm falling behind and this affects the quality of my work: I sometimes delay responding to important messages. Followng Peter Drucker's dictum "If you can't measure it, you can't manage it", I decided to write a tool to analyze my incoming and outgoing email messages.

Thankfully, I've resisted the temptation of using an online service for managing my email, and therefore I have all my email messages stored on my hard disk. They are stored in the relatively simple mbox format. Yet, the complications of parsing email headers are so great that I decided to use Mark Overmeer's excellent Mail-Box email processing Perl package. Through experiments and code reviews I performed last year I found that this package was the most correct and comprehensive among the libraries available for any language.

For reporting the results I initially planned to use Perl's built-in reporting mechanism. However, I then thought that tables would be easier to create and more readable if they were in HTML, so I opted for that approach. For the first time in my life I used an HTML generation library rather than printing HTML tags by hand. For this I adopted Pete Krawczyk's HTML::AsSubs module. I found it very easy to use, and it helped a lot my code's readability. I also used many function parameters, which reduced considerably the code's duplication. For instance, all tables are created by a single subroutine.

You can find the source code of the Perl script I wrote here. If you plan to use it on your own email you'll need to customize the script in the places I've marked. Sadly I lack the time to make it configurable through user options. If you add support for a different mail box format please post your changes as a comment to this blog entry, so that others can benefit from it.

The script creates a summary of the following measures:

  • Number of messages
  • Number of recipients
  • Number of senders
  • Number of active days
  • Average messages per day
  • Average messages per month
  • Average messages per folder
  • Average messages per recipient
  • Average messages per sender
It also produces the following tables:
  • Emails by month
  • Emails by month ordered by volume
  • Emails by day of week
  • Emails by day of week ordered by volume
  • Emails by hour
  • Emails by hour ordered by volume
  • Top 10 folders
  • Top 10 email addresses
  • Emails by folder
  • Emails by folder ordered by volume
  • Emails by address
  • Emails by address ordered by volume
You can see the (redacted, 350kB) report of what the script's report on my work-related folder here.

Through the analysis I found a number of interesting facts:

  • I must process about 80 messages every day to keep my email under control,
  • the messages I send are more than the messages I receive and file,
  • the emails I've been processing each month has been increasing,
  • midday and the days in the middle of the week are the busiest times (a well-known observation),
  • most emails are related to human-resource issues, e-government, and a few tough projects, and
  • most emails come from three people.
Now I need to reflect deeper on the results to deal effectively with the email deluge.

Comments   Toot! Share

Last modified: Thursday, May 20, 2010 12:40 am

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.