On 02/27/2014 03:14 AM, Roger Dingledine wrote: > On Sun, Feb 23, 2014 at 05:38:23PM +0530, Devang Thakkar wrote: >> Its Devang here, a coding enthusiast studying at IIT Bombay. I am >> looking forward to contribute to Tor for the upcoming Google Summer of Code >> 2014 as a prospective student. So I wanted to know if there was a provision >> for Web Scraping using Tor. If there is, I would to know more about it or >> if there isn't, is it a feasible Summer of Code project? > > Web scraping using Tor is usually regarded as a bad thing -- first > because it loads down the Tor network much more than normal browsing, > and second because it makes destination websites more likely to get angry > with Tor. For example, when Bing starts scraping Google over Tor in order > to improve their search results, Google responds by making it harder to > crawl Google over Tor, which impacts normal Tor users reaching Google too. > > So I think we'd be happy to have a project on how to make website scraping > through Tor less damaging to destinations and thus to users, but I think > we're unlikely to find a "make it easier to scrape websites through Tor" > project exciting. Inconveniently enough, scraping websites (and hidden services) over Tor is exactly what a lot of the CMU Tor-related research involves. We have developed a few in-house tools for it (none of which are anywhere close to turnkey). We haven't put any serious thought into making it "less damaging to destinations," but I think we would be interested in helping with a project along those lines. Offhand I dunno if there's so much code as best practices documentation needed, though (what's an appropriate level of rate limiting, you really ought to run a private entry node, that sort of thing...) zw
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev