[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [Libevent-users] evhttp_encode_uri() fails to escape certain characters



On Fri, Sep 24, 2010 at 4:21 PM, Bas Verhoeven <libevent@xxxxxxxxxx> wrote:
> Hello,
>
> While using 'evhttp_encode_uri()' to encode GET & POST fields (I hope this
> is the right function to use) I noticed that a lot of characters are not
> being escaped in the way I would expect them to be escaped.

Hi, all!  I'm trying to get a start on fixing http stuff and merging
http fixes.  First, though, I need to get up to speed on the evhttp
stuff myself.  This is a nice easy bug report, so I'll start out here.
 I'm going to say lots of confident-sounding stuff below, but please
take what I'm saying with a grain of salt and correct me when I go
wrong.

The first thing to figure out is, "what is evhttp_encode_uri
*supposed* to do?"  Unfortunately, the documentation isn't too
helpful.  All it says is,
/**
  Helper function to encode a URI.

  The returned string must be freed by the caller.

  @param uri an unencoded URI
  @return a newly allocated URI-encoded string
 */

Well, that makes no sense.  "URI-encoding" is a pretty well-defined
operation for individual parts of a URL, but it's not an operation you
can apply to an entire "unencoded URI" [*].  Unfortunately, the
documentation here implies that it takes a URI, which is nonsensical.
So, the documentation is contradictory: does it encode URIs, or does
it URI-encode strings?  Well, its behavior is *almost* right for
URI-encoding strings, and there is no such thing as "encoding a URI".

I *THINK* it should say something more like this:
/**
  Helper function to encode a string for use in a URI.  All characters
  that are "unreserved" according to RFC3986 -- that is, ascii A-Z, a-z,
  0-9, hyphen, dot, underscore, and tilde -- are left unchanged.  All other
  characters are replaced by their hex-escaped equivalents.

  The returned string must be freed by the caller.

  @param str a string to encode
  @return a newly allocated URI-encoded string
 */

That's a much more sensible function for us to have.  Will it break
anything?  Well, if you're building your URL by saying something dumb
like
    asprintf(&url, "http://example.com/?q1=%s&q2=%s2";, v1, v2);
    encoded = evhttp_encode_uri(url);
then you're already in trouble, since evhttp_encode_uri() treats ? as
special.  And if you're building your query by saying something dumb
like
    asprintf(&query, "?1=%s&q2=%s2", v1, v2);
    encoded_query = evhttp_encode_uri(query);
then you're in trouble because evhttp_encode_uri() treats & as
special.  But what if somebody is saying something iffy like
    asprintf(&query1, "q1=%s", v1);
    asprintf(&query2, "q2=%s", v2);
    encoded1 = evhttp_encode_uri(query1);
    encoded2 = evhttp_encode_uri(query2);
    asprintf(&url, "http://example.com?%s&%s";, encoded1, encoded2);
?

If they were relying on the previous broken behavior of
evhttp_encode_uri(), changing it to do the right thing will break
them.  Of course, their code is already broken if they were relying on
evhttp_encode_uri() actually encoding + characters reliably, so
they're not in good shape either way.

I've looked through the first few pages of google codesearch results
for evhttp_encode_uri, and not found anything that suggests someone is
doing this broken-but-almost-working thing.

So, time to go ahead and make this change?  The affected characters
are "!$'()*+,/:=@"


[*] For example, if somebody gives you a string that looks like
http://example.com/?a=b&c=d, and they tell you that it is a URI that
needs encoding, then it's already too late to encode, since you can't
tell whether they meant to have the query parameters be "a" -> "b" and
"c" -> "d", or  just "a" -> "b&c=d".  Thus, you can't "URI-encode" a
URI[**]

[**] Unless you're URI-encoding the URI in order to to use it as a
query parameter or something.


yrs,
-- 
Nick
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.