[Greylist-users] greylisting and per-message sender ids

Ken Raeburn raeburn at raeburn.org
Wed Oct 8 21:42:38 PDT 2003


(OT: I only just subscribed to the list tonight; previously I'd been
reading it through the archive.  A message I sent earlier suggesting
this seemingly obvious idea and asking why it wasn't in use got held
for moderation, and as far as I know, is still in the queue, unless
the moderator deletes on-topic non-member messages without further
notification.)

After three days or so of loading my database (not delaying or
rejecting mail), my database has 916 entries, of which 853 indicate
only one message sent through.  There are 893 unique sender addresses
in my database.  If I ignore the per-message VERP ids, there are 67
unique addresses, plus 6 per-message-VERP patterns:

    <binutils-return-#-raeburn=raeburn.org at sources.redhat.com>
    <ding-owner+M#@lists.math.uh.edu>
    <gcc-bugs-return-#-raeburn=raeburn.org at gcc.gnu.org>
    <gcc-patches-return-#-raeburn=raeburn.org at gcc.gnu.org>
    <gcc-return-#-raeburn=raeburn.org at gcc.gnu.org>
    <sentto-#-#-#-kr-yahoo=raeburn.org at returns.groups.yahoo.com>

Why not use these patterns in the database instead of the per-message
ids?  Find regexps for the digit sequences in context, and replace
them with a fixed string like "#".  Then message 1234 may be deferred,
but a few hours later, message 1235 will come through right away.

In the 90 minutes or so since I installed the patch included below, 20
messages have come through on 5 of these lists, causing 5 of the
patterns to be added to the database and matched.  Messages from other
lists have come through without problem as well.  (Spam
notwithstanding, the bulk of the email to this machine is list
traffic.)  Only the binutils list hasn't sent me traffic in this time,
and the others should exercise all three substitution statements in
the code below.

Most of the active VERP-using lists I'm on are obviously from a fairly
small number of sites, so there may be other patterns used elsewhere
that could be added to the list.

Ken


--- /root/relaydelay-0.04/relaydelay.pl	2003-10-05 23:40:44.000000000 -0400
+++ /usr/local/sbin/relaydelay.pl	2003-10-08 19:02:38.000000000 -0400
@@ -227,6 +227,44 @@
 }
 
 
+##########################################################################
+#
+# Name transformations:
+#
+#  Some mailing lists use per-message per-recipient envelope sender ids,
+#  a form of VERP that's great for tracking who missed what message, but
+#  particularly annoying for a recipient using Greylisting techniques.
+#
+#  However, the sender ids typically come in a small number of forms,
+#  always using a numeric field for the message number, so we can just
+#  replace those numeric fields with a fixed string.
+#
+#  Possible future enhancement: Make this site-configurable.
+#
+##########################################################################
+
+
+sub do_verp_substitutions {
+    my ($x) = @_;
+    my ($localpart, $hostpart) = split("@", $x, 2);
+
+    # One popular form seems to be
+    # "listid-msgnumber-encodedrecipient at listhost".  The message
+    # number has dashes before and after it.
+    $localpart =~ s/-[0-9][0-9]*-/-#-/g;
+
+    # Do it again in case there are three such blocks strung together,
+    # as in Yahoo Groups sender ids.
+    $localpart =~ s/-[0-9][0-9]*-/-#-/g;
+
+    # The mailer used at lists.math.uh.edu for the Gnus developers'
+    # list uses this form.  Not sure what list package they're using.
+    $localpart =~ s/\+M[0-9][0-9]*$/+M#/;
+
+    return "$localpart\@$hostpart";
+}
+
+
 #############################################################################
 #
 # Milter Callback Functions:
@@ -316,6 +354,11 @@
     }
   }
 
+  # Wrap this in eval just in case some customization work accidentally
+  # breaks it.  If it blows up, the assignment doesn't happen, and we
+  # quietly move on.
+  eval { $mail_from = &do_verp_substitutions($mail_from); };
+
   # Save our private data (since it isn't available in the same form later)
   #   The format is a comma seperated list of rowids (or zero if none),
   #     followed by the envelope sender followed by the current envelope


More information about the Greylist-users mailing list