PermaLink Gmail, spam and blacklisting
GmailI have written before about the issue of Gmail blacklisting, often by Spamcop and I suspect at least occasionally by others like SORBS.

I was reminded of this yesterday when I received a reply to an email I sent to Paul Mooney last week. Paul had written asking for my opinion on something and I had written back within an hour or two but he hadn't seen that reply for 5 days. Why? Because as Paul eventually wrote to me, "your Gmail mail was blocked by postini and I hadn't gone in to check the block list".

Well, I trawled my vast spam archive for Gmail spam samples and found that, since the last time I wrote about this, there have been no further deliveries of Gmail spam here (there have only ever been two). And this is despite the fact that Gmail is whitelisted here and that all Gmail including spam will therefore always be accepted.

So Gmail spam simply isn't a huge problem and those of you able to use whitelists can safely whitelist Gmail (in fact all Google) IPv4. Here's my list of Google IPv4 (from AS15169 and I think complete).
  • [64.233.160-191.*] (CIDR 64.233.160/19)
  • [66.102.0-15.*] (CIDR (66.102.0/20)
  • [66.249.64-95.*] (CIDR 66.249.64/19)
  • [72.14.192-223.*] (CIDR 72.14.192/19)
  • [72.14.224-239.*] (CIDR 72.14.224/20)
  • [216.239.32-63.*] (CIDR 216.239.32/19)

However, this ongoing issue with Gmail outbound relays being listed by Spamcop needs to be resolved. To do this, first we must understand it.

When deciding what if any IP to list, Spamcop parses received headers looking for a valid chain. Where the chain breaks (forgery is obvious, or an IP in the chain is a known open proxy for example), Spamcop will list the last valid IP in the chain which is often the IP address in the only unforged received header in a direct-to-MX spam. Spamcop will treat internal handoffs using RFC1918 addresses as valid links in the chain, but it does look for a valid, publicly routable IP at the beginning of the chain.

Most email sent via web mail services like Yahoo! mail or Hotmail (footnote 1) has these two common characteristics:

  1. Some internal handoffs exist in the received chain though these do not always mention an IP and
  2. The true originating IP is recorded at the beginning of the received chain in a header like "Received: from [source.IP] with HTTP"

Here are (slightly obfuscated) received headers from a Yahoo! web email:

Received: from web32006.mail.mud.yahoo.com ([68.142.207.103])
          by my.domino.host (Lotus Domino Release 7.0)
          with SMTP id 2005110909455092-3236 ;
          Wed, 9 Nov 2005 09:45:50 +0000 
Received: (qmail 47738 invoked by uid 60001); 9 Nov 2005 09:45:47 -0000
Received: from [real.originating.IP.address] by web32006.mail.mud.yahoo.com via HTTP; Wed, 09 Nov 2005 01:45:46 PST

Note that first received header (the bottom one as new headers are correctly prepended) which includes "via HTTP". If this were a spam hitting a Spamcop trap, the IP that would be considered for listing is the one in that header.

Now let's look at a Gmail sample:

Received: from nproxy.gmail.com ([64.233.182.192])
          by my.domino.host (Lotus Domino Release 7.0)
          with ESMTP id 2005110909225569-3192 ;
          Wed, 9 Nov 2005 09:22:55 +0000 
Received: by nproxy.gmail.com with SMTP id a25so22043nfc
        for <me[at]domino>; Wed, 09 Nov 2005 01:22:56 -0800 (PST)
Received: by 10.48.211.9 with SMTP id j9mr91812nfg;
        Wed, 09 Nov 2005 01:22:56 -0800 (PST)
Received: by 10.49.5.11 with HTTP; Wed, 9 Nov 2005 01:22:56 -0800 (PST)

This time the internal handoffs do reveal RFC1918 IP addresses but one thing is missing. That first (bottom again) received header includes no "from", only "by". Thus the true source IP is never revealed and the only publicly routable IP in the whole chain is in that very last received header (the top one) and belongs to Gmail's outbound relay (they call them proxies).

If this were a spam hitting Spamcop's trap, the only IP considered for blacklisting would be that one belonging to Google's outbound relay and this, it seems, is what frequently actually happens.

Now, why does Google not reveal the true source? This faq page attempts an explanation, though it is unsatisfactory.

Protecting our users' privacy is something we take very seriously. Personal information, including someone's exact location, can be gathered from someone's IP address, so Gmail doesn't reveal this information in outgoing mail headers. This prevents recipients from being able to track our users, or uncover what may be potentially sensitive personal information.

This is flawed for three reasons:

  1. An IP address is not personal information by any reasonable definition of that term
  2. While the HTTP step is obviously not covered in the SMTP protocol standard and a received header recording it is technically not mandatory, both by common consensus and to be consistent with the spirit of RFC2821 a proper header is at the very least courteous and
  3. Gmail's own approach to this alleged privacy issue is inconsistent

What do I mean by that last statement? Check these sample received headers from a Gmail sent not via the Gmail web interface but using an email client (Mozilla Thunderbird), and submitted via Gmail's inbound MSA.

Received: from xproxy.gmail.com ([66.249.82.204])
          by my.domino.host (Lotus Domino Release 7.0)
          with ESMTP id 2005110909365742-3222 ;
          Wed, 9 Nov 2005 09:36:57 +0000 
Received: by xproxy.gmail.com with SMTP id r21so344547wxc
        for <me[at]domino>; Wed, 09 Nov 2005 01:36:55 -0800 (PST)
Received: by 10.64.185.7 with SMTP id i7mr572065qbf;
        Wed, 09 Nov 2005 01:36:55 -0800 (PST)
Received: from Thunderbird ( [real.originating.IP.address])
        by mx.gmail.com with ESMTP id m3sm849972qbe.2005.11.09.01.36.52;
        Wed, 09 Nov 2005 01:36:55 -0800 (PST)

Here we see both a similar chain of internal handoffs and a valid received header (the bottom one again) that does indeed record the true originating IP.

This perhaps explains why you don't see more Gmail spam. You can hide the originating IP if you submit via the web interface but not if you use any mail client including all current spamware. Manually submitting large numbers of spams via the web interface is simply not productive although if the bad guys learn to script it, then we'll have trouble.

Bottom line:

  • Gmail isn't currently a spam haven and can safely be whitelisted
  • Gmail will continue to be blacklisted, not because there is a huge spam problem there but because of a wrong headed and inconsistent approach to the issue of privacy
  • Gmail could conceivably fall prey to massive abuse (just like Yahoo! mail and Hotmail with their seemingly endless 419 spam), particularly if the bad guys learn to use AJAX


  1. Most but not all Hotmail as they do sometimes seem to subsitute their own NAT/proxy address for the true submission address, though this is currently fairly unusual




See also:
Spam from Gmail
SpamCop lists Gmail
GSpam? - Part 2
Could Gmail spam take off?




Category: GMail
Technorati:

Comments :

1. Matthias Leisi09/11/2005 20:11:23
Homepage: http://matthias.leisi.net/


Very simple to find all "official" outgoing IP addresses of GMail, since GMail offers SPF records:

[leisi@athena](~) host -t txt gmail.com
gmail.com descriptive text "v=spf1 a:mproxy.gmail.com a:rproxy.gmail.com a:wproxy.gmail.com a:zproxy.gmail.com a:nproxy.gmail.com a:uproxy.gmail.com a:xproxy.gmail.com a:qproxy.gmail.com ?all"

Now, a little bit of command line magic:

for i in `host -t txt gmail.com | sed "s/^.*v=spf1//" | sed "s/a\://g" | sed "s/\?all.*//"`; do host -t a $i; done | sed "s/^.*address //" | sort | uniq | less

turns this into

216.239.56.240
216.239.56.241
[..]
64.233.162.192
64.233.162.193
[..]

GMail also offers Domain Key Signatures covering their Received: line (amongst other data). Checking that signature would make an explicit list of IP addresses obsolete.




2. Chris Linfoot09/11/2005 21:30:21


I agree the SPF angle is a useful one though as I trust Google in general not to spam me, it is easier just to drop their half dozen or so CIDR blocks into the whitelist.

As for DomainKeys - the issue here is an IP oriented one as we are counteracting a DNSBL listing. We already know from the IP that the message is from Google without looking at the DomainKeys signature and in any case, both of my Gmail spam samples have a valid DomainKeys signature anyway. This proves nothing we don't already know - or did I miss something




3. Coward11/05/2007 15:08:31


Spam from gmail accounts is now ramping up quickly - they aparently take a very slow approach to stopping spammers, as well as protect them by hiding sender IP addresses, so it's now getting widely abused.
Half of all my spam today is from googles servers...




4. Chris Linfoot11/05/2007 15:36:20


Hi Noel Coward. Send me a few samples by email please, complete with headers.




5. Alaa Khalil12/06/2007 09:39:02
Homepage: http://www.webmasterstc.com


I know this conversation was two years ago, but i need the buttom of the line guys, is it possible to get the true originating IP address of an e-mail sent from a Gmail account, if yes Explain to my e-mail or here, cuz i soooo need it for legal purposes..

any help please.




6. Chris Linfoot12/06/2007 10:42:07


is it possible to get the true originating IP address of an e-mail sent from a Gmail account

The answer is

No - if sent via the Gmail web interface including Gmail mobile
Yes - if sent via a POP/SMTP mail client like Outlook Express or Thunderbird




7. Dude26/07/2007 15:57:35


One point, IP addresses ARE personal information, regardless of how accurately you can be tracked down using them. It is a piece of information specifically relating to you and your computer, hence, personal info. IP addresses are also included in (for uk at least) data protection laws, IIRC.




8. Chris Linfoot26/07/2007 16:40:58


@7 - Your opinion. I disagree. And the DPA makes no mention of it either.




Unable to post a comment? Please read this for a possible explanation...
Add Manual Trackback
Please enter the details of the trackback post. Your trackback will not appear on the site until it has been verified. This won't be immediate, as trackbacks are validated on a scheduled basis. Be patient.











Search
Popular Categories
Monthly Archive
Other stuff
ClustrMaps
Contact Me
Meta
Proudly powered by IBM Lotus Domino 8 Proudly powered by IBM Lotus Domino 8

Subscribe to articles Subscribe to articles feed

Subscribe to comments Subscribe to comments feed

ROR info ROR info


My Amazon wish list Wishlist


Wikio - Top Blogs - Technology
Like what I do?
Research Autism Then please consider a donation to support the work of Research Autism.
Idea Jam
Planet Lotus
Dilbert