[Greylist-users] Some more data points

Wed Jul 2 10:49:39 PDT 2003

On Wed, 2 Jul 2003, Scott Nelson wrote:

> >Is this using greylisting by itself, or were you also using
> >spamassassin and/or rbl lists?
> >
> Greylisting only - I whitelist my own address space,
> but there wasn't any email from it to the spam traps.

You'll get much better numbers if you use rbl lists too, since greylisting
is a complementary technology to them.

One more very interesting number that I haven't (yet) been able to gauge is
the number of spams that would not have been blocked by rbl/razor/whatever
lists if they were accepted when first seen, but since they were delayed,
but are in the rbl lists by the time the greylist block expires.
Unfortunately, it requires a lot of lookup work at every delivery attempt.

If you could find a way to get statistics on that from a set of spam-only
accounts, that would be very interesting.

> ...
> These are only emails destined for the 100 spam trap accounts
> that have greylisting turned on.

I forgot that you were testing on only spam traps.

> 1000 isn't exactly huge, but +/- more than twice the square root
> of the sample size is unlikely.
> A variance larger than 10% should be extremely unlikely.
> My numbers for the first 6 days are /dramatically/ better - over 90%.
> That doesn't add up, and since no one else mentioned a similar increase,
> I'd have to go with "I must of goofed up"

Spamming is really hit and miss.  Depending on how your spam traps were
"advertised" affects what types of spammers are hitting them.  1000 just
seems like a very small number to do statistical analysis on.

Your particular observation could be as simple as the end/beginning of the
month, and one or two spammers that had your addresses had quotas to fill.
If one of those used a real mail host (with retries), I think that could
significantly throw off your numbers.

Can you do an analysis on the triplets and try to establish spammer
associatins by seeing how many came from the same IP/range of IP's, or how
many were from/to similar addresses?

> >> In other tests I ran, there was a marked difference in successes
> >> rates when tempfailing after the RCPT rather than after DATA.
> >> Eyeballing my logs, I notice a lot of instant retries on a different
> >> IPs after failure, usually three times.
> >
> >That's why I tend to favor reporting by unique triplets.  That removes these
> >types of accounting errors, even though I didn't see that many of them.
>
> Uh, no.
> Here's a snippet from my logs;

Ahh, I missed the part about different IP's in your original comment.
You're right, with different IP's would automatically cause different
triplets.

But your example logs bring up another point in why blocking after RCPT is
good.  Those kinds of examples are perfect for doing the traffic analysis I
mentioned in the paper.  It quickly identifies several distributed IP's that
are very likely part of a spam network.  Using that kind of information in
real time to populate blacklists with some sort of confidence rating is
something I hope to be able to develop as more larger sites start using
greylisting.

Evan