[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] Tor and Google error / CAPTCHAs.

To: tor-talk@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [tor-talk] Tor and Google error / CAPTCHAs.
From: Alec Muffett <alec.muffett@xxxxxxxxx>
Date: Sun, 25 Sep 2016 19:14:42 +0100
Delivered-to: archiver@xxxxxxxx
Delivery-date: Sun, 25 Sep 2016 14:15:58 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=SR5/TrcR1nLlmcMylWX1p5ouJIFxbbDjpEaUDMD8PiI=; b=UlaK8F5X/KQOg6l7c4igPw9yjghXuXC3ZU5DF5n0o/fuw1obfyW5F/hF+fNdH78sv9 w1APgJPDmzVh+n0adF/X99pUg3lZSQZ8udDB3gAeRaAjobfBSDyPcfBHqsRXBlTQMaAc xNUpLjStVC8KMh9nORGeNrh7WVDisvHhdycj/QOw/zu0uFS3bZuIzct8cRWgVUv6lBQt jtS13PjIT5bhTVA4Xu4YlpMSA4HsfXOnmWp563C9gSrQbRy6M0kc6tn2fif+tSGKtMo9 qfia3Qgrr5AS1zfyfVwqyPExCE99k8NnTuwyNUQf1p8eQJ5RnWdKsIQTv0QY2/q6NKl5 CPFQ==
In-reply-to: <1b8edc0f51901d0ad4092655ebdc01dc@openmailbox.org>
List-archive: <http://lists.torproject.org/pipermail/tor-talk/>
List-help: <mailto:tor-talk-request@lists.torproject.org?subject=help>
List-id: "all discussion about theory, design, and development of Onion Routing" <tor-talk.lists.torproject.org>
List-post: <mailto:tor-talk@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk>, <mailto:tor-talk-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-talk>, <mailto:tor-talk-request@lists.torproject.org?subject=unsubscribe>
References: <3490e0941be420d04b0585899b2f68a6@openmailbox.org> <CAFWeb9KfX=BNz9a9BfsiywotTuLKh1S9RcAzmDjdtwa8o7TNoA@mail.gmail.com> <1b8edc0f51901d0ad4092655ebdc01dc@openmailbox.org>
Reply-to: tor-talk@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-talk" <tor-talk-bounces@xxxxxxxxxxxxxxxxxxxx>

On 25 September 2016 at 17:54, <blobby@xxxxxxxxxxxxxxx> wrote:

> Hi Alec,
>
> Thanks for your detailed and informative response. I had never heard of
> "scraping".

Scraping comes in many forms and with many motives and intentions - in the
previous email I managed to outline a couple, but that is no more than a
sketch of one aspect of the topic.

Scraping also raises interesting legal arguments, both pro-and-con - for
instance:

* https://en.wikipedia.org/wiki/Facebook,_Inc._v._Power_Ventures,_Inc.
*
http://blog.icreon.us/web-scraping-and-you-a-legal-primer-for-one-of-its-most-useful-tools/

...and Weev:

*
http://arstechnica.com/tech-policy/2012/11/internet-troll-who-exploited-att-security-flaw-faces-5-years-in-jail/

...and of course, Aaron Swartz:

* http://www.newyorker.com/tech/elements/when-programmers-scrape-by

...so when I say "many forms and with many motives and intentions", I must
acknowledge "dual use" - that some forms of scraping are benign, or are
protest, or are sharing that which perhaps should be shared.

But here, primarily, I am discussing the forms of scraping which are
third-party-based and exploitative of user data with intent to defraud; or
similar.

BTW: are you the Alec Muffett name-checked in Kevin Mitnick's
> autobiography? I assume so.
>

Yeah, that was a long time ago. :-)

> It may be of note that when I got the Google error, Amazon also required a
> CAPTCHA in order for me to login to my account. Whomever was using the exit
> node maliciously, was obviously affecting non-Google organizations too.
>

Indeed, that's possible; in fact I should amend my previous post to point
out that "scrapers" - people who scrape - do so through many different
proxy networks, not only Tor, and also that some forms of scraping utilise
(eg:) malicious browser plugins that are installed by otherwise entirely
blameless people: victims who don't realise that their web browser is now
helping a part of some scraping outfit's infrastructure.

You ask an interesting question about "badness" of IP addresses; long story
short what you are referring to are "IP reputation databases" - which are
used by many people, for instance:

https://github.com/botherder/targetedthreats/blob/master/targetedthreats.rules

…from Claudio Guarnieri (@botherder) is a list of IP-based Snort IOC
(Indicator of Compromise) rules for civil society organisations to use.
tldr: If your organisation sees network traffic matching the list of IOCs
on your network, bad shit may be happening to you.

Speaking generally about industry rather than specifically about FB or any
other company: there are only (worst-case) 4 billion IPv4 addresses in the
world (and a few more v6) and since the average hard drive is ~1Tb nowadays
it's pretty trivial to build & share databases of how much "badness" is
measured to be emanating from any given IP address.

So that's what tends to happen: it's not (necessarily) a matter of what
kind of software the computer is running (though that is helpful to know) -
nor would it completely matter what country the computer appears to be in
(though some countries _are_ more lax about quenching bad network
neighbourliness).

Instead it's more (though not exclusively) a matter of measuring actual
observed behaviour emanating from given IP addresses.

What happens *after* such information gets collected is more interesting;
some organisations call for network "shunning" a-la redlining (
https://en.wikipedia.org/wiki/Redlining) - others enforce CAPTCHAs on IP
addresses which are known to enable scrapers.  Yet more do rate-limiting or
temporary bans.

An organisation's response to scraping seems typically the product of:

1) the technical resources at its disposal
2) its ability to distinguish scraping from non-scraping traffic
3) the benefit to the organisation of sieving-out and handling the
non-scraping traffic, rather than ignoring it all

I would argue that Facebook was the first to launch a really large onion
site by scoring highly (HHH/HMH) in all three of these categories: big
brains, actual high-signal login credentials, and a million normal people
who want to use Facebook over Tor (especially "at need").

By comparison I would estimate Google as HMM (or HML) and Cloudflare as
HLL; both companies with great people (I know many of them) but with Medium
or Low abilities to sort scraping from non-scraping, and Medium or Low
impetus to do so.

This is why corporate outreach is so important for Tor: to build awareness
and raise perception so that that third statistic becomes more important
for other companies to address.

    - alec

-- 
http://dropsafe.crypticide.com/aboutalecm
-- 
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk

Follow-Ups:
- Re: [tor-talk] Tor and Google error / CAPTCHAs.
  - From: Alec Muffett
- Re: [tor-talk] Tor and Google error / CAPTCHAs.
  - From: Jon Tullett

References:
- [tor-talk] Tor and Google error / CAPTCHAs.
  - From: blobby
- Re: [tor-talk] Tor and Google error / CAPTCHAs.
  - From: Alec Muffett
- Re: [tor-talk] Tor and Google error / CAPTCHAs.
  - From: blobby

Prev by Author: Re: [tor-talk] is it me or did tor talk get really quiet?
Next by Author: Re: [tor-talk] Tor and Google error / CAPTCHAs.
Previous by thread: Re: [tor-talk] Tor and Google error / CAPTCHAs.
Next by thread: Re: [tor-talk] Tor and Google error / CAPTCHAs.
Index(es):
- Author
- Thread