Note: Make sure you also read the Best Practices for using Autosort.

Manuscript contains a sophisticated spam-blocking algorithm that learns how to recognize spam automatically as you train it.

Rather than using a fixed set of spam clues, for example, assuming that “mortgage” must mean spam, it learns from your own incoming email. If you work for a bank, “mortgage” probably doesn’t mean spam.

In addition to using only positive clues (for example, “V1agra” probably means spam), Manuscript will learn from negative clues as well (for example, if the email contains the name of one of your products it’s much less likely to be spam.) Manuscript examines many aspects of the incoming email for clues which could be considered positive signs of spam, negative signs of spam, or neutral. And since you train it, it will adapt itself to the particular stream of email that you receive.

When you first install Manuscript and turn on Autosort, Manuscript sets up a project named Inbox with three areas: Spam, Not Spam, and Undecided. At first, Autosort has no clues at all about what messages are spam and what messages are not spam. All incoming messages are put straight into the Undecided area.

To train Autosort, you need to teach it about every message in the Undecided area, either by flagging it as spam by clicking the Spam button, or by moving it to the Not Spam area if it’s not spam.

Any time you see a message in the wrong area, take the time to move it to the right area. This will help train Autosort.

After a few days, you should notice that Autosort is correctly sorting most messages. In the first few days there is a small chance that a few messages will be mistakenly flagged as spam. Don’t worry about this, but do move them into the Not Spam area to help train Autosort.

After you’ve received a bunch of spam and a bunch of nonspam, typically after a couple of days or about 100-200 messages, you’ll find that Autosort is doing a really good job automatically sorting messages. But no matter how good it gets, it will always be undecided about some messages and you’ll have to decide those cases yourself.

Autosort tries to be conservative to avoid accidentally flagging a message as spam when it’s not really spam. In practice, we have found that even with an email address that receives hundreds of spam messages a day, it is extremely rare for Autosort to accidentally mark something as spam that is a legitimate email. In fact, our experience is that it’s more common for humans to mistake a real email for spam than for Autosort to make this mistake! Unfortunately, there’s always the possibility that a legitimate email from a customer will look so spammy that it gets deleted accidentally. If you are concerned about this, set aside some time to review the spam messages every few days just to be certain nothing legitimate is getting lost. On the whole, though, you’ll find that Autosort does a great job with very few “false positives.”

To save you time, Manuscript treats emails sorted as spam slightly differently.You will not receive notifications, auto-replies, or escalation reports regarding spam emails. Spam emails are also conveniently hidden from most views of your cases and summary reports, although they are still accessible with the click of a link.

Implementation details

Manuscript implements a modified version of the Bayesian filtering algorithm proposed by Paul Graham in the article A Plan for Spam and Better Bayesian Filtering, with modifications and improvements designed by Fog Creek technical staff.

You can read about the nitty-gritties of how training works in Manuscript.