Thus spake Robert Ransom (rransom.8774@xxxxxxxxx): > On Wed, 25 Aug 2010 20:04:01 -0700 > Mike Perry <mikeperry@xxxxxxxxxx> wrote: > > > I also question Google's threat model on this feature. Sure, they want > > to stop people from programmatically re-selling Google results without > > an API key in general, but there is A) no way people will be reselling > > Tor-level latency results, B) no way they can really expect determined > > competitors not to do competitive analysis of results using private IP > > ranges large enough to avoid DoS detection, C) no way that the total > > computational cost of the queries coming from Tor can justify denying > > so many users easy access to their site. > > If Tor exit nodes were allowed to bypass Google's CAPTCHA, someone > could put up a low-bandwidth Tor exit node and then send their own > automated queries directly to Google from their Tor exit's IP. Good point. However I wasn't advocating whitelisting Tor exits, I was advocating more intelligent treatment of all high user-count IP addresses, and better mechanisms of rate limiting in general. It's my understanding that a lot of NATed users also run into these captchas during search. To reduce scraping by suspect IPs, their servers could perform all sorts of browser tests to ensure that there is a full working DOM supported by javascript, which can be computationally costly to deploy by scrapers. They can also serve javascript code that performs semi-large integer factorization in the background and post the factors back with queries to rate limit scrapers computationally, or at least tip the cost ratios more in favor of just paying for an API key. Perhaps more effective, they could use various metrics to indirectly estimate the number of humans behind an IP. There are plenty of Google services and applications they provide that aren't really usable by bots. The rate of use of these non-search services per IP should provide a strong indicator of human activity behind that IP. Again, the impression I got was that if they had done the analysis on the captcha solve rate vs the query rate per IP, the cost/benefit analysis of the DoS mechanisms they apply, or the cost vs effectiveness vs user impact of alternatives, they certainly weren't willing to discuss any of this with us. They also seemed disinclined to meet to explore any realistic alternatives we could jointly develop in both Torbutton and the DoS side to help reduce the captchas and 403s experienced by our users. -- Mike Perry Mad Computer Scientist fscked.org evil labs
Attachment:
pgpENEWWVSjn9.pgp
Description: PGP signature