[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] Tor and Google error / CAPTCHAs.

Hi Alec,

Thanks for your detailed and informative response. I had never heard of "scraping". BTW: are you the Alec Muffett name-checked in Kevin Mitnick's autobiography? I assume so.

It may be of note that when I got the Google error, Amazon also required a CAPTCHA in order for me to login to my account. Whomever was using the exit node maliciously, was obviously affecting non-Google organizations too.

Since you used to work at Facebook (and I know you've posted on this list before about the FB onion address), I've a couple of questions based on my experiences with FB and Tor.

I'm wondering if FB (and, for that matter, other companies like Google) have some kind of hierarchy of "badness" of IP addresses. For example, for FB is an exit node "worse" than a SOCKS proxy which is "worse" than a VPN? I ask because I usually login to my FB via a London-based IP provided by my ISP. However, when I try to login to my FB account via an exit node with a London IP or via a SOCKS proxy with a London IP, I am asked to verify myself by selecting photos of my friends. I could well understand this if I was logging in from an IP - any type of IP - in, say, France but I don't really understand why a London-based IP should be suspicious since it matches the usual geographical login location, unless of course, all exit nodes and known SOCKS proxies are suspicious to FB irrespective of whether or not they correlate with the "normal" IP location of the user (in my case London).

What I am trying to ask is: how does FB (or similar organisations) decide that an IP is "bad" when it is in the same place as the IP that normally logs in to an account.

I wonder if you have any thoughts on the matter. Thanks!

On 2016-09-24 14:21, Alec Muffett wrote:
On 24 September 2016 at 13:07, <blobby@xxxxxxxxxxxxxxx> wrote:

Question: what are these people actually doing with the exit node IP that
upsets Google?

That's a good question; I don't know about Google specifically, but when I was at Facebook the most common Tor-exit-node-related problem was called

Scraping was/is when people with bad intentions hid behind Tor in order to
disguise attempts to access and copy people's public pages, looking for
personal information (names, addresses, pet names, emails, anything...)
which could be correlated somehow and monetised, eg: via phone fraud or

Tor is useful to these people because if they were making such access
attempts from a single IP address, or a single subnet, it would be easy to
track and stop them.

So "scraping", along with other/similar reasons, is why tor exit nodes have such shitty "IP Reputation" in the tech industry. The Tor exit nodes hide
a bunch of people who are doing scraping.

Of all the big companies in tech, Facebook probably has some of the
theoretically easiest challenges of addressing scraping - because quite a lot of content is only available when one is "logged in" to Facebook, so instead of blocking IP addresses Facebook instead can block _accounts_ that scrape; however that is not a panacea and fighting scraping at Facebook is
still a _massive_ task.

By comparison Google may have a even harder challenge to combat scraping because much of Google content is meant to be available without logging-in,
therefore Google rely more heavily upon IP-address as an identifier.

Continuing the spectrum - Cloudflare have an enormously harder challenge
than Google, because they are mostly supplying only "network-level"
services to their customers, so lack knowledge of username, userids, and (most?) cookies that actual platform-providers might be able to use when
fighting scraping.

If you correlate this spectrum with "corporate friendliness towards Tor", I
think you will see a causative pattern emerge; Tor does great work in
enabling access to these services and platforms for people in need, but it also serves to hide/enable scrapers and other malfeasance. To not recognise
this and instead (for example) to violently beat-up Cloudflare for
"blocking tor" serves only to entrench anti-Tor sentiment.

This is why a few months ago I wrote a blogpost[1] explaining how best I
believe to get more companies to be friendly towards Tor.

Because any amount of denial, public raging and placard-waving is not going
to help.  It needs outreach.  It needs mutual understanding and
communication of benefits.

    - alec



tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to