[Greylist-users] Some more data points

Scott Nelson scott at spamwolf.com
Wed Jul 9 01:22:02 PDT 2003


At 12:59 AM 7/9/03 -0500, Evan Harris wrote:
>
>On Wed, 2 Jul 2003, Scott Nelson wrote:
>
>> >One more very interesting number that I haven't (yet) been able to gauge is
>> >the number of spams that would not have been blocked by rbl/razor/whatever
>> >lists if they were accepted when first seen, but since they were delayed,
>> >but are in the rbl lists by the time the greylist block expires.
>> >Unfortunately, it requires a lot of lookup work at every delivery attempt.
>>
>> Well, for a /given/ set of DNSBLs it would be simple for me to look
>> up the IP and save the results in the log at RCPT time.
>> Then I could (presumably correctly ;) parse the data out of the logs later.
>>
>> Is there a particular DNSBL(s) you (or anyone else) are interested in
>> seeing the data for?  Easy to add them to the list now...
>
>Well the ones I think are still around are MAPS, SPEWS, NJABL, Spamhaus/SBL
>and Osirusoft.  There may be others though.
>

There are a lot more, someone was kind enough to send me this link;
  http://www.declude.com/JunkMail/Support/ip4r.htm

I'm tempted to just include them all.


>> >Can you do an analysis on the triplets and try to establish spammer
>> >associatins by seeing how many came from the same IP/range of IP's, or how
>> >many were from/to similar addresses?
>>
>> I can, and I will, but first I'm going to
>> debug my "number of connects/passed" scripts.
>
>Any luck on the analysis yet?
>

Two errors mucked the data;

In arbitrary cases, I was listing a triplet as "passed" both after 
RCPT and after the final '.'
I can account for that, and it raises the spam blocked to about 80%.

I also forgot to clear the triplets before restarting the test.
This means many triplets "passed" but were never "added".
Unfortunately, I don't actually record the entire triplet in the
log, so I can't correct for this entirely, but it's at least
another 5%.  Add the uncertain of the small sample size,
and the final answer is "85% +/- 15% blocked with about 90% confidence".

I didn't do much over the fourth of july weekend.
I expect to start again with a clean slate this weekend.

...

Although my testing was not nearly as accurate as I hoped, 
even with pessimistic assumptions greylisting is blocking over 70%.
I encountered only one problem (a mailer with a 12 hour retry).
I didn't see the groupwise problem, but it's not too hard
to whitelist the problem servers on a case by case basis.

The results are so good in fact, that I've gotten greylisting 
pushed into production use on about 100 servers.


>> It would be nice to be able to identify 0wn3d boxen.
>> Even if we can only identify a few percent of them, it's huge win IMO.
>>
>> I was actually rather surprised by the IP hopping.
>> I've always assumed that most spammers weren't listening to bounces,
>> but clearly some of them are paying very close attention indeed.
>> Makes me wonder if any are tailoring content as well.
>
>I wouldn't be surprised if they did.  Or at least, generated other
>randomness to add to their mail in order to get around content filters.  But
>they're already getting pretty smart about confusing bayes-type filters.
>
>I'm really hoping someone will design a blacklist that works off of traffic
>analysis, and possibly also possibly doing something more fancy like
>combining razor scoring with IP blacklisting.  There has to be a way to
>increase the accuracy and response time better than current blacklists.  And
>blocking the greylisting way with a tempfail instead of permanent blocks
>leaves alot more room for correcting possible mistakes in the list.
>
>If I had someplace that would host the server, I might even try to work
>something up myself.
>

Personally, I've never liked the concept of IP == Identity.
As a practical matter, there are too many ways to beat it,
and it also has bad implications for future expandability.

But if there's a good chance of identifying cracked machines
without active testing, and without me having to write code, 
then I'm sure I could line up a server.


Scott Nelson <scott at spamwolf.com>



More information about the Greylist-users mailing list