[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-talk] How to write program that uses Tor network
This is a lot of really good advice. Thanks. For some reason, I was
thinking C++ would give a measurable performance increase for the spider,
but after having questioned that it seems really dumb. Obviously the
network will be the bottleneck by far. I think I'll still use C++ for the
back end though, since that's where the performance might matter. I'm also
thinking about using Wt for the frontend. It seems like most of the search
engines on Tor aren't capable of holding up to the load. Do you think
that's caused by computational limits or upload/download rate limits?
On Oct 1, 2015 8:16 AM, "Apple Apple" <djjdjdjdjdjdjd32@xxxxxxxxx> wrote:
> Asio is only a socket library which means you would need to build all the
> Http logic on top of it, which is not very fun but everything you need to
> know is documented in RFCs if you really want to go down that route.
>
> The "best/ easiest" way would be to use a http library specifically for the
> purpose of fetching webpages. Curl is a good one. To integrate Tor support
> it is simply a matter of setting a SOCKs proxy, the same way you configure
> a web browser to use Tor.
>
> Make sure that your library contains an option to proxy DNS as well. If
> fetching bing.com works but an onion site doesn't then you probably have a
> DNS leak. Curl provides an option to fix this but it is not enabled by
> default.
>
> This is not really related to Tor but are you sure C++ is the right
> language for this? You will quickly discover that web developers have a
> very easy life. Not a single one of them is capable of writing valid HTML
> but browsers need to process it anyway (hence why there are so many bugs in
> browsers).
>
> You can get kind of far using regular expressions. You can get kind of
> further with libtidy and an XML parser. If you are serious though I would
> recommend an alternative language such as ruby + nokogiri or python +
> beautiful soup, at least to do the HTML parsing.
>
> Of course you can always embed a parser written in another language into an
> existing C++ code base (Python is easy, Ruby is harder but I have done it).
> If you are still at the greenfields stage of the project you should think
> about this early.
>
> I hope this helps.
> --
> tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
> To unsubscribe or change other settings go to
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
>
--
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk