[Greylist-users] greylist lib in C? + several Q's

William Blunn bill--greylist at blunn.org
Wed Aug 25 02:56:17 PDT 2004


> 1) what is the correct behaviour when you have multiple
> "RCPT TO"s before the DATA command?  My feeling is you should
> create a separate triplet for every user, independent of each
> other.  Is that correct?

Yes.

> But what if some of the recipients
> are already greylisted and others are not?

Depends on when you invoke your greylisting function.

If you invoke it at RCPT time (which is the way I would recommend,
because it gives you the best and strongest rejection), then you return
a separate result for each recipient and it just comes out in the wash
anyway.

> Do you decide
> to either pass the mail or not, for all users, or do you
> reject all recipients except one

Again, if you do it at RCPT time, you can return results for each
recipient.

If you do it somewhere else, you need some way of combining the
greylisting results for all the recipients.  This is an implementation
decision.

The algorithm I used was to take the "most rejectional" result out of
all the results for all of the recipients.  The possible greylisting
results for each recipient are:

  0 ACCEPT
  1 TEMPREJECT
  2 REJECT

Then take the maximum value reached for this messsage, and that is the
result for the message.

> and cause the sender to
> send individual copies to each person so you can make the
> decision on a per user basis?

I am unable to relate your question to my understanding of e-mail
delivery.  Re-tries are done by the sender's MTA.  The sender's MTA will
track the delivery status for each recipient, so if you accept some and
tempreject others, the sending MTA will re-try only the ones which were
temprejected.

> On the other hand unless
> this is coming from a relay, isn't it likely to always be
> good mail anyway?

Bagley includes partial match processing to cover the case where there
are some recognised recipients and some unrecognised recipients.

In most cases, Bagley is able to accept unrecognised recipients provided
that there is also a recognised recipient from the same sending network
and sender e-mail address (see MIDGREY below).

> 2) Is there any danger in *always* doing the temporary
> reject after the DATA command is complete?  I know that the
> whitepaper suggests doing this only for MAIL FROM:<>
> (with some hacks for broken mailers) but for my purposes
> I'd rather like to do it that way all the time.  One
> reason being I want to store the mail, for QA purposes,
> so we can be sure that good mail has not been rejected;

You're not rejecting it, you're temporarily rejecting it.

Your MTA is *always* entitled to tempreject.  The load average could be
too high.  That is what temporary rejection is for.  If the sending MTA
doesn't re-try, then that is *their* problem, *you* are golden, *they*
don't have a leg to stand on.

> another is that I'm considering greylisting *only* if it
> fails a spam test - otherwise it is accepted.

I'm not sure that this is a good idea.  Greylisting tends to be the
better initial triage function as it can weed out a vast proportion of
incoming delivery attempts at very low cost.

> This ought
> to cut down the risk of delays to legitimate mail, which
> appears to be a concern here.

In my experience greylisting generates very few false positives.

Are you a business?  If so, then it is a cost/benefit analsys.

What is the cost of a very few number of false positives which you would
have with greylisting compared to the cost of the enormous amount of
spam and virus messages which greylisting would stop?

> If we do store the mail
> for a certain time, waiting to see if it is resent, would
> a simple hash function allow me to recognise the same
> mail the next time round or does mail change in small
> ways when it is resent?  (We need to recognise resent
> mail in order to take it out the store, so that any
> remaining mail after the retry expiry delay must be
> the spams that we rejected)
> 
> 3) Has anyone documented all the special cases and little
> tweaks that different greylisting implementations have
> aquired, in one place, or does everyone reinvent the wheel?!

I believe it is the latter.

> 4) What is the longest observed delay between first attempt
> and the retry, for a legitimate sender?

I have seen three days.

Some MTAs re-try based on the target HOST, so if you have a new
greylisting set-up and you get a fair number of messages coming through,
say, an ISP mail relay, then you can get some nasty pathological cases
in the beginning. These usually settle down after a few weeks.

I set the expiry time for unretried triples to one week.

> 5) What is the shortest?

I have seen some that re-try in the order of five seconds.

On my system, I set the minimum re-try period to one minute.

> 6) How common is it that spammers send to the same people
> from the same IP over an extended period?  (eg 'spam-friendly'
> ISPs, rather than hacked machines)
> 
> 7) Has anyone put together any sort of test harness for
> QA testing a greylisting implementation?  (I'm considering doing
> this too - the library will allow an arbitrary arrival time to
> be entered, rather than just 'now()' - so that a long period
> of activity can be simulated in seconds)
> 
> 8) The whitepaper suggests storing the arrival and expiry times.
> Is there a reason for storing anything other than the arrival
> time?

Not that I am aware of.  It does not seem like a good idea to me.

The Bagley system stores only NOW() based times.

(In fact the Bagley system, unlike every other implementation I have
seen, works entirely in UTC, thereby eliminating the problems which
would otherwise occur at the daylight-savings/non-daylight-savings
changeover.)

In the Bagley system, all expiry times etc., are stored as system-wide
configuration values, and are added/subtracted from database values at
the point of comparison.

The Bagley system also stores a status value against each triple, so
that if, for example, the greylisting delay is extended, you do not get
triples which were previously happy suddenly start temprejecting again.

The status value possibilities are:

  BLACK, WHITE,   (self-explanatory)

  DARKGREY,       (we have not seen this triple before)

  LIGHTGREY,      (this triple has been retried after the greylist period)

  REVERSE,        (we have seen an ougoing message which would match
                  a triple such as this)

  MIDGREY         (we have seen a message which matches sending network,
    and sender e-mail address of a LIGHTGREY record, which
    we accepted, but because it has not been re-tried we
    won't treat this triple as LIGHTGREY)
    
Once a triple becomes LIGHTGREY, it stays LIGHTGREY, regardless of
whether the greylisting delay is extended.

> The expiry time is calculated by a simple addition of a
> constant, but if you change your policy, wouldn't you want it
> to apply retroactively to all entries in your database rather than
> just new ones being added?

Exactly.

That is what the Bagley system does.

Bill


More information about the Greylist-users mailing list