[Greylist-users] Some more data points

Scott Nelson scott at spamwolf.com
Wed Jul 2 10:36:15 PDT 2003

At 09:49 AM 7/2/03 -0500, Evan Harris wrote:
>On Wed, 2 Jul 2003, Scott Nelson wrote:
>> >Is this using greylisting by itself, or were you also using
>> >spamassassin and/or rbl lists?
>> >
>> Greylisting only - I whitelist my own address space,
>> but there wasn't any email from it to the spam traps.
>You'll get much better numbers if you use rbl lists too, since greylisting
>is a complementary technology to them.

Yes, and there are a number of other techniques that would reduce
the spam substantually as well.  Spam assassin catches about 75%
of what got through.

But in these tests I'm trying to measure the 
effectiveness of greylisting, not block spam.
My goal is to be able to answer the question,
"If a user turns on greylisting and does nothing else,
how much is their spam reduced?".

I'd also like to know how it works in combination with other 
techniques, or at least have a better answer than what can be 
pulled from the vast suppository of knowledge most people seem to use.

>One more very interesting number that I haven't (yet) been able to gauge is
>the number of spams that would not have been blocked by rbl/razor/whatever
>lists if they were accepted when first seen, but since they were delayed,
>but are in the rbl lists by the time the greylist block expires.
>Unfortunately, it requires a lot of lookup work at every delivery attempt.
>If you could find a way to get statistics on that from a set of spam-only
>accounts, that would be very interesting.

Well, for a /given/ set of DNSBLs it would be simple for me to look
up the IP and save the results in the log at RCPT time.
Then I could (presumably correctly ;) parse the data out of the logs later.

Is there a particular DNSBL(s) you (or anyone else) are interested in
seeing the data for?  Easy to add them to the list now...

>> ...
>> These are only emails destined for the 100 spam trap accounts
>> that have greylisting turned on.
>I forgot that you were testing on only spam traps.
>> 1000 isn't exactly huge, but +/- more than twice the square root
>> of the sample size is unlikely.
>> A variance larger than 10% should be extremely unlikely.
>> My numbers for the first 6 days are /dramatically/ better - over 90%.
>> That doesn't add up, and since no one else mentioned a similar increase,
>> I'd have to go with "I must of goofed up"
>Spamming is really hit and miss.  Depending on how your spam traps were
>"advertised" affects what types of spammers are hitting them.  1000 just
>seems like a very small number to do statistical analysis on.
>Your particular observation could be as simple as the end/beginning of the
>month, and one or two spammers that had your addresses had quotas to fill.
>If one of those used a real mail host (with retries), I think that could
>significantly throw off your numbers.

Actually, the majority of my problem was that I sometimes 
listed a message as "passed" more than once.  Still trying
to track down that bug.
Removing duplicates reduces the "passed" to 203 or roughly 20%.

>Can you do an analysis on the triplets and try to establish spammer
>associatins by seeing how many came from the same IP/range of IP's, or how
>many were from/to similar addresses?

I can, and I will, but first I'm going to 
debug my "number of connects/passed" scripts.

>> >> In other tests I ran, there was a marked difference in successes
>> >> rates when tempfailing after the RCPT rather than after DATA.
>> >> Eyeballing my logs, I notice a lot of instant retries on a different
>> >> IPs after failure, usually three times.
>> >
>> >That's why I tend to favor reporting by unique triplets.  That removes these
>> >types of accounting errors, even though I didn't see that many of them.
>> Uh, no.
>> Here's a snippet from my logs;
>Ahh, I missed the part about different IP's in your original comment.
>You're right, with different IP's would automatically cause different
>But your example logs bring up another point in why blocking after RCPT is
>good.  Those kinds of examples are perfect for doing the traffic analysis I
>mentioned in the paper.  It quickly identifies several distributed IP's that
>are very likely part of a spam network.  Using that kind of information in
>real time to populate blacklists with some sort of confidence rating is
>something I hope to be able to develop as more larger sites start using

It would be nice to be able to identify 0wn3d boxen.
Even if we can only identify a few percent of them, it's huge win IMO.

I was actually rather surprised by the IP hopping.
I've always assumed that most spammers weren't listening to bounces, 
but clearly some of them are paying very close attention indeed.
Makes me wonder if any are tailoring content as well.

Scott Nelson <scott at spamwolf.com>

More information about the Greylist-users mailing list