[Greylist-users] greylist lib in C? + several Q's

Tue Aug 24 23:36:30 PDT 2004

> 1) what is the correct behaviour when you have multiple
> "RCPT TO"s before the DATA command?  My feeling is you should
> create a separate triplet for every user, independent of each
> other.  Is that correct?  But what if some of the recipients

This is correct..

> are already greylisted and others are not?  Do you decide
> to either pass the mail or not, for all users, or do you
> reject all recipients except one and cause the sender to
> send individual copies to each person so you can make the
> decision on a per user basis?  On the other hand unless
> this is coming from a relay, isn't it likely to always be
> good mail anyway?

If the 'triplet' information is in the database already,
mail destined to that user will be accepted. If the triplet
information is not in the database, just mail to that user
will be rejected. The connecting MTA will retry again later
for those temp failures.

> 2) Is there any danger in *always* doing the temporary
> reject after the DATA command is complete?  I know that the
> whitepaper suggests doing this only for MAIL FROM:<>

 From what i remember, it recommends that it happens at
the RCPT TO: part.

> (with some hacks for broken mailers) but for my purposes
> I'd rather like to do it that way all the time.  One
> reason being I want to store the mail, for QA purposes,
> so we can be sure that good mail has not been rejected;

If you are creating a generic library, this question
really doesn't need answering here. It would be up to
the MTA (or whatever application) is hooking into the
library that would choose which would be the best place
to do the rejection. If you want to do it after DATA
then you can.. However i'd say doing that only during
the beginning/testing stages would be a wise idea..
You're going to land up processing a lot of the data
twice..

> another is that I'm considering greylisting *only* if it
> fails a spam test - otherwise it is accepted.  This ought
> to cut down the risk of delays to legitimate mail, which

What you are preposing (pre-queue scanning) will not
scale for large environments.

> appears to be a concern here.  If we do store the mail
> for a certain time, waiting to see if it is resent, would
> a simple hash function allow me to recognise the same
> mail the next time round or does mail change in small
> ways when it is resent?  (We need to recognise resent
> mail in order to take it out the store, so that any
> remaining mail after the retry expiry delay must be
> the spams that we rejected)

I dont really see the point of this unless you're wanting
to save having to not scan the message for spam twice..
It would be best to do the hash strictly on the content
of the message and not include the headers as more and
more people are beginning to use fall-back relays so
headers will change on re-delivery.

> 3) Has anyone documented all the special cases and little
> tweaks that different greylisting implementations have
> aquired, in one place, or does everyone reinvent the wheel?!

The spec is defined and people choose their own ways of
implementing the spec..

> 4) What is the longest observed delay between first attempt
> and the retry, for a legitimate sender?

 From what i've heard, some broken MTA's have taken as long
as 3 days to redeliver the message.

> 5) What is the shortest?

Again broken MTA's (and spammers!) will retry 10 times
continously. Its best not to accept a message before 5
minutes..

> 6) How common is it that spammers send to the same people
> from the same IP over an extended period?  (eg 'spam-friendly'
> ISPs, rather than hacked machines)

Yes, this is definately the case. Depending on how your
greylisting daemon is written, you can harness a wealth
of information out of your greylisting logs..

> 8) The whitepaper suggests storing the arrival and expiry times.
> Is there a reason for storing anything other than the arrival
> time? The expiry time is calculated by a simple addition of a
> constant, but if you change your policy, wouldn't you want it
> to apply retroactively to all entries in your database rather than
> just new ones being added?

Perhaps i'm not following what you're getting at, but
how would you then expire each individual ticket? Also
its extremely useless for auditing/log tracing to see
when was the last time triplet information was updated..

Cami