[Greylist-users] greylist lib in C? + several Q's

Thu Sep 2 09:24:22 PDT 2004

"William Blunn" <bill--greylist at blunn.org> writes:
>> So if the email is to be (permanently) rejected for one of the
>> recipients, and accepted for others, you'd issue a reject after the
>> DATA phase?  I hope you don't allow random users to blacklist senders,
>> then.
>
> I think we have a shortcoming of understanding. [...]

Oh, I see.  The blacklist is for use with spambait addresses, not for
letting users say "this address is sending spam".  Thanks for the
clarification.

>>>> and cause the sender to
>>>> send individual copies to each person so you can make the
>>>> decision on a per user basis?
>>>
>>> I am unable to relate your question to my understanding of e-mail
>>> delivery.  Re-tries are done by the sender's MTA.  The sender's MTA will
>>> track the delivery status for each recipient, so if you accept some and
>>> tempreject others, the sending MTA will re-try only the ones which were
>>> temprejected.
>> 
>> ... and won't do so individually.
>
> Not so.
>
> If you TEMPREJECT at RCPT time (the only sensible way of doing it), then
> the sending MTA will track the delivery status of each recipient and
> future delivery attempts will only include recipients where delivery has
> not already been completed.

My mistake -- I confused this with the next question in Graham's
message, which was talking about temporary rejections after the DATA
phase.

> We should put in exceptions for large recalcitrant senders.
>
> However we should not do this alone:

Agreed.  Legitimate senders that are misbehaving deserve some
pestering for that bad behavior.  And user education is key.

> Exim 4 goes one better than this.
>
> Exim 4 checks to see if messages incoming from the remote MTA come in in
> the right order relative to outgoing messages.
>
> So, *before* sending the initial greeting, we check the receive buffer.
>
> If the remote mailer has already said "HELO", we report a
> synchronisation error and drop the connection.
>
> Similarly, we check to see if the remote MTA jumps the gun with MAIL,
> RCPT, DATA.
>
> Any SMTP non-conformance is dealt with in the same way.

I think sendmail does much of this.  I don't know if it actually
allows for disconnecting for a site using pipelining when the server
hasn't offerred it, but I'm pretty sure it'll at least log it and
delay the session.  I haven't looked at 8.13 much though; I just
wanted to grab the delayed-greeting code.

>>> The Bagley system also stores a status value against each triple, so
>>> that if, for example, the greylisting delay is extended, you do not get
>>> triples which were previously happy suddenly start temprejecting again.
>> 
>> So you've kind of got a block-expired flag instead of a block-expires
>> time...
>
> No, it's a status value.  It can have several values:

Ah, I see.  With the extra information encoded there, you get the much
the same benefit as storing multiple time values.  Encoded
differently, and with some variations, but it accomplishes much the
same thing.

> There are a couple of relevant time points after the time the triple was
> last seen: [...]
> We could calculate both of these and store both of them, but it is uselessly
> redundant and is not consistent with best database practice.
>
> The proper way to do it is to store the "last seen" time, and then do
> measurements from there using offsets according to what you are trying
> to achieve at the time.

I'm no expert on database design or "best practices", but both schemes
look pretty good to me.  Although why relatively static data like a
whitelist or blacklist, for which timestamps and perhaps some other
fields don't apply, and which you might want to share with other mail
processing tools and/or administer differently, belong in the same
table with the dynamic Greylist data is something I'm not too clear
on.  Is it better to search one big, frequently-updated table for
"sender is A or null, recipient is B or null, ip_addr is C or null,
etc", or search a small, rarely changed table that way and search a
bigger, more dynamic one (which could be huge at some sites) for
exactly one specific value in each field?  I'd also wonder about
storing statistics in the same table, like relaydelay does, but since
you're updating the last-seen time anyways, at least for the dynamic
table...actually, I'd like to be able to have shared expire times but
per-server stats.  But not having written the other mail processing
tools or administrative interfaces, or even any scripts to synchronize
my databases between servers (they're on different networks, so I'm
not going to make them connect to the same database server), I
couldn't say whether that's actually an issue or not.

With your scheme, do you wind up constructing queries to test
different time intervals for different states, or do you just retrieve
the data and then look at the state and timestamp?

Ken