[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [Libevent-users] evhttp_encode_uri() fails to escape certain characters
- To: libevent-users@xxxxxxxxxxxxx
- Subject: Re: [Libevent-users] evhttp_encode_uri() fails to escape certain characters
- From: Nick Mathewson <nickm@xxxxxxxxxxxxx>
- Date: Wed, 6 Oct 2010 13:12:22 -0400
- Delivered-to: archiver@xxxxxxxx
- Delivered-to: libevent-users-outgoing@xxxxxxxx
- Delivered-to: libevent-users@xxxxxxxx
- Delivery-date: Wed, 06 Oct 2010 13:12:28 -0400
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=CXuBYECFht1o+377ZXaSoVW3r20N13jNhoPXVHeniOE=; b=IqbDqSAasPWqC7cdJwBcxWcUdiKuSYv10vt8ET7BJYMUlnhPvrJrAomUGcSExlq8yO EZC3LxJTLrqXgDyvyJVFgpDBQ12qOKh150Oq51fMNbK/gchNLRN4p2Y0iXEcu2I8P8FU oPR7F8JHHCHmAfAAtXBjhcOde3vRAZnU6it+M=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=WqKO02fZZKD4jIk1oBBzdHG92Z1FHdyKmjK22E8kjkRPipsgI2h8q6MRiiZh+oWfiU r1pcmwUhgDlsWbxLo6rf20HxwtWvDUl8DALR4KxNcO/dfA2/2W108k61LHim9VOM0NZ1 OSILuM0VsAxolt29Con8WQEnPHKBb0OJTegtc=
- In-reply-to: <4C9D0831.3080201@xxxxxxxxxx>
- References: <4C9D0831.3080201@xxxxxxxxxx>
- Reply-to: libevent-users@xxxxxxxxxxxxx
- Sender: owner-libevent-users@xxxxxxxxxxxxx
On Fri, Sep 24, 2010 at 4:21 PM, Bas Verhoeven <libevent@xxxxxxxxxx> wrote:
> Hello,
>
> While using 'evhttp_encode_uri()' to encode GET & POST fields (I hope this
> is the right function to use) I noticed that a lot of characters are not
> being escaped in the way I would expect them to be escaped.
Hi, all! I'm trying to get a start on fixing http stuff and merging
http fixes. First, though, I need to get up to speed on the evhttp
stuff myself. This is a nice easy bug report, so I'll start out here.
I'm going to say lots of confident-sounding stuff below, but please
take what I'm saying with a grain of salt and correct me when I go
wrong.
The first thing to figure out is, "what is evhttp_encode_uri
*supposed* to do?" Unfortunately, the documentation isn't too
helpful. All it says is,
/**
Helper function to encode a URI.
The returned string must be freed by the caller.
@param uri an unencoded URI
@return a newly allocated URI-encoded string
*/
Well, that makes no sense. "URI-encoding" is a pretty well-defined
operation for individual parts of a URL, but it's not an operation you
can apply to an entire "unencoded URI" [*]. Unfortunately, the
documentation here implies that it takes a URI, which is nonsensical.
So, the documentation is contradictory: does it encode URIs, or does
it URI-encode strings? Well, its behavior is *almost* right for
URI-encoding strings, and there is no such thing as "encoding a URI".
I *THINK* it should say something more like this:
/**
Helper function to encode a string for use in a URI. All characters
that are "unreserved" according to RFC3986 -- that is, ascii A-Z, a-z,
0-9, hyphen, dot, underscore, and tilde -- are left unchanged. All other
characters are replaced by their hex-escaped equivalents.
The returned string must be freed by the caller.
@param str a string to encode
@return a newly allocated URI-encoded string
*/
That's a much more sensible function for us to have. Will it break
anything? Well, if you're building your URL by saying something dumb
like
asprintf(&url, "http://example.com/?q1=%s&q2=%s2", v1, v2);
encoded = evhttp_encode_uri(url);
then you're already in trouble, since evhttp_encode_uri() treats ? as
special. And if you're building your query by saying something dumb
like
asprintf(&query, "?1=%s&q2=%s2", v1, v2);
encoded_query = evhttp_encode_uri(query);
then you're in trouble because evhttp_encode_uri() treats & as
special. But what if somebody is saying something iffy like
asprintf(&query1, "q1=%s", v1);
asprintf(&query2, "q2=%s", v2);
encoded1 = evhttp_encode_uri(query1);
encoded2 = evhttp_encode_uri(query2);
asprintf(&url, "http://example.com?%s&%s", encoded1, encoded2);
?
If they were relying on the previous broken behavior of
evhttp_encode_uri(), changing it to do the right thing will break
them. Of course, their code is already broken if they were relying on
evhttp_encode_uri() actually encoding + characters reliably, so
they're not in good shape either way.
I've looked through the first few pages of google codesearch results
for evhttp_encode_uri, and not found anything that suggests someone is
doing this broken-but-almost-working thing.
So, time to go ahead and make this change? The affected characters
are "!$'()*+,/:=@"
[*] For example, if somebody gives you a string that looks like
http://example.com/?a=b&c=d, and they tell you that it is a URI that
needs encoding, then it's already too late to encode, since you can't
tell whether they meant to have the query parameters be "a" -> "b" and
"c" -> "d", or just "a" -> "b&c=d". Thus, you can't "URI-encode" a
URI[**]
[**] Unless you're URI-encoding the URI in order to to use it as a
query parameter or something.
yrs,
--
Nick
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users in the body.