[Greylist-users] Re: A Greylisting idea.

Sun Jun 22 15:41:48 PDT 2003

> > First, lists.  There's a perl module which is quite effective at
> > determining if an email came from a list by examining the headers.
> > Admittedly, you won't have the headers the first time an email is
> > delivered, but you can have the MTA detect it on retry, and have the
> > milter module add a wildcard entry permanently whitelisting that mailing
> > list.
>
> The key problem with that approach is that anything in the headers can be
> easily forged.  Since the code is open, once spammers become wise to this
> method, they can easily adapt their mail headers to take advantage of this.

I think I failed to explain exactly what I was trying to address.  I
wasn't referring to an exception so that list messages get to bypass the
greylist altogether.  I was just thinking that if the module sees a
message that has been accepted (in other words, it would have already had
to have passed the greylist test), then it could update the existing
database entry to use a wildcard match for the LHS of the envelope-from,
to get around the issue with lists that send messages from a different
address every time for bounce tracking.  In fact, since the module also
determines the list software used, we could do this only for that list
softwares that can be configured to send unique Froms.

Since this wouldn't affect the first message, I don't think that this
would be too large an exploitable loophole, since just about any exploit
to get around it would still have to be an exact match for the second
attempt, and future attempts would still have to match the IP address,
sender domain, and receiver's full address.  O.K., so the whitelist entry
probably shouldn't be permanant, that might be a loophole, but as it's
self-renewing and the list sends out a message at least as often as the
interval (you mentioned 36 days), that shouldn't be a problem.

The real downside of this is processing time.  Mail::ListDetector requires
a Mail::Internet or MIME::Entity object.  The former is more lightweight,
though I'm using the latter to parse MIME email so that my body rules only
get applied to the correct body parts, which means that I'm sucking down
the entire email, which can be a burden when someone attaches a large file
to an email.  I haven't tried it, but since a Mail::Internet object
doesn't care about the body, you can probably pass just the headers to the
constructor.

>> Second, the MTA that blows out if all the RCPT TO:s fail:  What MTA is
>> that?  I've only seen spamware fal own that way.  Have you considered
>> having the milter code override the error code if RCPT TO:s failed?  Is
>> it even possible to do this using the milter interface?
>
> I've noticed at least one system where this happens, and I've been
> trying to get in contact with the administrator of the system that sent
> it to find out about what their mail system is, but no luck yet.

Potentially twisted idea:  If this is in fact an issue, what you should be
able to do is add a dummy address to the recipients list the first time
that a RCPT TO: should be temp-failed and you haven't accepted any of the
addresses, remove the dummy address if you accept an email address
after adding it, and then if/when the data command is sent (sendmail won't
perm-fail it because there is one valid recipient), you can temp-fail the DATA
command, assuming that all recipients were temp-failed.  Hopefullly the
MTA isn't dimwitted enough to ignore *ALL* return codes, though you never
can tell.  Any MTA that broken isn't worth talking to, IMO.

Now to back up a little bit, here's where I'm coming from, having not
actually installed the software yet.  This is off-topic, and is more my
way of introducing myself to the list.

My current spam filtering/blocking consists of a number of DNSBLs, some
maildrop rules, and custom perl code.  Prior to seeing this paper, my plan
(which hasn't changed much) was to shift to:

  1) conservative DNSBLs reject the email outright.
  2) check a handful of rules that have never produced a false positive
	(helo'ing with *MY* ip address, for example), and reject the
	email outright on failure.
  3) check my version of the greylist (described below) and temp failure
	the email if it isn't listed.
  4) accept the email, run the rest of my rules in a tag-only mode where
	each failure get a one-token (word) tag, but no rejection.
  5) let Mozilla or crm114 (or other bayesian or similar checker) handle
	filing the email into the normal inbox or a spam inbox.

The reason I hadn't shifted yet was mostly procrastination, as the base
perl in FreeBSD doesn't support threads.

The way I was thinking of greylisting was to only track from/to/ip addr
until the second message, and then just whitelist the IP address for all
email, on the assumption that if it passed the MTA IQ test for one
transaction, it would pass it for all of them, have greylisting entries
valid for 35-45 days, and renew them whenever an email matched them.

By having step 4 add one-word tags to the email, this lets step 5, the
bayesian filtering, decide which tags are actually relevant.  Right now,
my perl rules are running 98-99% correct in detecting spam, and more than
99% in detecting non-spam.  However, by letting the bayesian filtering
determine the weighting of the rules, I expect to just about eliminate the
false positives.  This will also help the fact that my perl rules
currently look at behavior, not the words themselves, and as the bulkers
modify their behavior to be less noticable, the words will become more
important.  Right now, there's almost no text words that I'm looking for
within an email, I'm mostly looking at things that no proper MUA would
generate, or what appears to be deliberate attempts to hide.  Several
single characters all seperated by a space or the same punctuation in the
subject, text drawn in the same color as the background, escape-encoding
alphanumeric characters in a URL, etc.  Hmmm... just thought of a minor
issue totally unrelated to the list/greylisting that I need to fix,
otherwise one UBE in a digest will cause the entire digest to get sent to
the UBE folder.

Though in truth, I expect the rules to become much less important.  The
conservative DNSBLs should catch the mainsleezers/spamhausen, and the
greylist should catch the chickenboners.  The other parts of the filter
would only be needed to catch what falls between the two.
Effective combined arms comes to the spam war :-)