[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] Craigslist now giving Tor the slows, lol



On 06/13/2014 11:51 PM, grarpamp wrote:
> On Sat, Jun 14, 2014 at 12:38 AM, Mirimir <mirimir@xxxxxxxxxx> wrote:
>>> No more variance = tor issue.
>>> Still variance = IP <--> CL stack/path issue, or CL issue alone.
>> As I understand it, you're just getting the HTML. I'm getting the entire
> 
> It was the first time I saw any site serving slow to some tor exits.
> So I removed all variables and went for a single url fetch to confirm...
> no recursion, redirects, embedded elements, robots.txt, or anything else.
> I'm waiting for a slow affected exit operator to get back to me about
> test to eliminate unlikely possibility of tor software itself.

That makes sense. I'll add that to the test mix. I gather that you're
using something like liburi-fetch-perl, yes? A little reading tells me
that sites more often reject curl and wget, compared with fetch and
lynx. But I'll use whatever you're using for basic HTML.

>> page, or at least whatever Midori grabs while pretending to be Firefox.
>> For example, I get http://xvideos.com/ with numerous (X-rated) images ;)
>>
>> Also, I was hitting sites at 1-2 minute intervals
> 
> This may actually be far less than overall fetch rate from tor users
> to the top50, and certainly insignificant to the sites daily hit count.
> Someone needs to research overall exit traffic sometime too.

Sorry, I wasn't clear. I meant that I might have been overloading my Tor
client with too many simultaneous circuits.

>> craigslist, the greatest loading time was about 500 seconds. So perhaps
> 
> If other sites are loading similarly slow it may be possible to find out
> why or what is being used to do it. CL never replies to support queries.

Fundamentally, CL doesn't care what anyone else thinks ;)

>> 30-60 minutes to 20-40 minutes. That may reduce page-size variance.
> 
> A lot of the top50 use dynamic 'content' so it is expected on those,
> unless fetching single elements.

Again, I'm talking about effects on my client and the VM it's in, not on
Tor relays or websites.

>> There's also wkhtmltopdf. Maybe it does a better job, being lighter even
>> than Midori. But I worry that it also may look less like a browser than
>> command-line Midori.
> 
> I'm not too worried about emulation/hiding unless it affects the results
> being studied. ie: content/blocking differences depending on supplied
> User-agent.

Right.

>> Once I work out kinks, and collect enough data, I'll write this up
>> somewhere with results for all 50 top sites.
> 
> Good, we are doing some generic things it seems. And should not
> use this CL specific thread subject anymore for it :)

Agreed. But I would like a response about fetch (liburi-fetch-perl?).
-- 
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk