If they cannot detect a crawler using Tor, then they cannot detect any other crawler, like a crawler switching IPs as mentionned in another post, using vpns or proxies, etc
So in that case it's useless to block Tor, because Tor network's size is not really significant compared to other means that crawlers have, probably they just choosed the easy way as well as crawlers might have chosen the easy way too (use Tor), blocking Tor so they have solved one problem.
But in fact they have solved nothing if they are not protected against crawlers, and if they are protected the protection would be something like blocking the IP or sending a captcha.
Maybe the exit nodes could implement an anti-crawler feature, even if the crawler is switching among 1000 exit nodes I think it's feasible to fingerprint it in the Tor network finite space, I don't know if there are studies about this, an efficient crawler can never behave like a human being or a normal browser.
This might sound like a kind of censorship but that's probably not the goal of the Tor network to crawl and spam the web, the exit nodes that would have removed the feature would just get blocked.
Le 20/01/2015 21:24, Roger Dingledine a écrit :
On Tue, Jan 20, 2015 at 03:16:43PM -0500, Greg Norcie wrote:So today I noticed Yelp appears to be blocking Tor. I tried using multiple identities, but get a 503 error every time. Anyone else have this issue? This seems really overbroad... I could understand the argument against letting a Tor user post reviews, but to disallow simple browsing seems like overkill.See their two mentions in https://blog.torproject.org/blog/call-arms-helping-internet-services-accept-anonymous-users I think Yelp blocks Tor for a similar reason to why Craigslist blocks Tor -- their value-add is their web pages, and if their competitors "steal" their web pages, they've got nothing else. That kind of business model leads to jealously guarding all of their pages from being crawled or otherwise viewed en masse -- which in turn leads to being angry and bitter at every new Internet technology. Unfortunately, since this business model is so fragile, I don't have good ideas on how to make it compatible with Tor (or heck, with most of the rest of the Internet). --Roger
-- Peersm : http://www.peersm.com torrent-live: https://github.com/Ayms/torrent-live node-Tor : https://www.github.com/Ayms/node-Tor GitHub : https://www.github.com/Ayms -- tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx To unsubscribe or change other settings go to https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk