[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: Chunked must be last Transfer-Encoding problem

Michal <teknet8@xxxxx> wrote:

[Line lenght adjusted]

> My problem is when i try to communicate thru my script with proxy:
> my $proxy = "";;
> my $mech = WWW::Mechanize->new();	/* same class as LWP:UserAgent */
> $mech->proxy('http',$proxy);
> my $urlsearch = "http://www.pricegrabber.com";;
> $mech->get($urlsearch);
> print $mech->content;
> I receive:
> 500 Chunked must be last Transfer-Encoding 'identity'

I can't reproduce this here, but "WWW::Mechanize" sets
"interesting" header values that could cause problems
like this.

"Connection: close, TE" and "Accept-Encoding: identity"
aren't exactly by the book and if you got the time,
you might want to ask the "WWW::Mechanize" authors
to fix them.

If I remember correctly, "identity" existed in earlier drafts
of the HTTP standard but got (partly) removed in the final one.
It was supposed to mean "no encoding" which is equal to simply
not specifying any encoding.

What's a bit ironic here is that Privoxy versions before
3.0.5 beta had a similar bug and set "Transfer-Encoding: identity"
after dechunking.

It was discovered because it broke "WWW::Mechanize".

> When i change in privoxy to point not to Tor but other http proxy
> everything works fine. When i change $proxy variable to connect directly
> to other HTTP proxy everything works fine. With this code some web pages
> works fine (like http://www.wp.pl) but others (like
> http://pricegrabber.com) returns this error. With other proxies all
> pages works fine.
> Where could be the problem ? What does Tor changes when forwarding HTTP
> packets ? How could i make my script is working correctly with all pages
> with Tor ? (like with other proxy servers) ?

Tor itself doesn't alter HTTP headers but some Tor server
operators consider it a good idea to proxy, rewrite or filter
your requests behind your back and/or to provide you with modified
or outdated content.

The error message you got could be the answer of an
intercepting proxy running on the Tor exit node that
didn't like the request headers. It could also be the answer
of the target server, but at least for me www.pricegrabber.com
ignores the broken headers. It also wouldn't explain why it
worked for you without Tor.

If you activate header logging in Privoxy (debug 8)
you might be able to decide where the answer really
came from.

In any case, using Privoxy's prevent-compression action
should work around the problem, unless your "WWW::Mechanize"
version has different defects than mine.


Attachment: signature.asc
Description: PGP signature