[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Scroogle and Tor



On 02/14/2011 06:29 PM, scroogle@xxxxxxxxxxx wrote:
> Some have wondered why anyone would want to abuse Scroogle
> using Tor. Apart from some malicious types that may be
> doing it for their own amusement, it looks to me like they
> are trying to datamine Google -- arguably the largest,
> most diverse database on the planet.
>
Makes a lot of sense. Actually, can hardly blame them for wanting to
mine the data. Of course, you make it pretty easily available, as you
detail. I can see why this starts to present a problem.
> I spend a couple hours per day blocking abusers. A huge
> amount of this is done through a couple dozen monitoring
> programs I've written, but for the most part these
> programs provide candidates for blocking only, and
> my wetware is needed to make the final determination.
>
Ouch, that really sucks... time like that adds up fast.

> Now I'm seeing script writers who have solved the SSL
> problem. This leaves me with the user-agent, the search
> terms, and as a last resort, blocking Tor exit nodes.
> If they vary their search terms and user-agents, it can
> take hours to analyze patterns and accurately block them
> by returning a blank page. That's the way I prefer to do
> it, because I don't like to block Tor exit nodes. Those
> who are most sympathetic with what Tor is doing are also
> sympathetic with what Scroogle is doing. There's a lot of
> collateral damage associated with blocking Tor exit nodes,
> and I don't want to alienate the Tor community except as
> a last resort.
>

Well...google uses the captcha system. Hard to say how well that works.
I doubt anything too simple is going to work here, for many reasons,
including the ones that you specify. How about this... we know you can
(mostly reliably) detect tor exits.

I think you have your goals wrong. You don't need to stop the scripts
from getting to google, even google can't stop that on their own site.
What you need is to make abusive use unprofitable on a scale that matters.

Tor users care about their privacy right... but you need a way to
differentiate them. So how about a temporary registration system? I get
sent to a page with a captcha (or two kinds even). If I pass, then I get
a token (set in a cookie, or put in the query string) that lets me do
searches. Maybe I can set when it should expire (up to a max).... maybe
put in a 30 second timeout before it becomes active. (slow them down
some more)... maybe limit the rate per ip over time for registrations?

Secondly, have you considered poisoning their stream? If you detect an
obvious abusive script, return randomized cached results. Ruining their
work, rather than just slowing them down, might convince them to move on
and try somewhere else. It is a thought anyway.

> One reason why Scroogle has lasted for more than six
> years is that we are nonprofit, and Google knows by now
> that I don't tolerate abuse. My job is to stop the abuser
> before Scroogle passes their search terms to Google.
> Abusers who use Tor make this more difficult for me.
> Blocking an IP address is easy, but blocking Tor abusers
> without alienating other Tor users is more complex.

It will be sad to see tor users lose your service (I actually had only
heard the name before this thread, very curious to check it out now).

-Steve

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxxx with
unsubscribe or-talk    in the body. http://archives.seul.org/or/talk/