Email data collected from my own inbox, with importance set by Gmail's Priority Inbox. I have removed content features (for privacy reasons). The remaining email features are:
me: 'bcc', 'cc', 'to' or 'only'
attachments (number of)
I also added features related to the sender of the email (also called "contact"):
conversations: number of all conversations/threads with contact, capped at 20
%from: proportion of threads started by contact
%read: proportion of those that I read
%starred: proportion of those that I starred
% contact replied: proportion of threads started by me that contact replied to
time contact reply: average time he took to reply
length contact reply: average length of his reply
% me replied: proportion of threads started by contact that I replied to
time me reply: average time I took to reply
length me reply: average length of my reply
in address book: 'TRUE' if sender/contact is in my address book; 'FALSE' otherwise.
See Bootstrapping Machine Learning for more information.