[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[Libevent-users] Incremental search



Hi,

is there anything that I am missing in the buffer interface that would
allow me to do an efficient incremental search when a lot of negative
results are returned on the way? I mean a case of parsing a stream
delimited with some string.

A typical example from the documentation[1]:
#v+
request_line = evbuffer_readln(buf, &len, EVBUFFER_EOL_CRLF);
if (!request_line) {
    /* The first line has not arrived yet. */
} else {
    if (!strncmp(request_line, "HTTP/1.0 ", 9)) {
        /* HTTP 1.0 detected ... */
    }
    free(request_line);
}
#v-

Let's assume this code is in the bufferevent read callback.

Also, let's assume that in that callback I get a lot of negative results
-- data arrives often but it rarely contains the delimiter.

Now let's change that a bit -- I would like to delimit data with some
other character/string. I should probably use evbuffer_search_range().

If I see correctly in case of evbuffer_readln() the search always
happens from the beginning of the chunks chain.

In case of evbuffer_search_range() theoretically I can (optionally)
provide the start evbuffer_ptr[2] but I have a problem finding an
efficient way to update that pointer after every negative result.

The only concept I came up with would look something like this:
#v+
tmp_ptr = evbuffer_search_range (buf, "\t", 1, incr_ptr.pos != -1 ? &incr_ptr : NULL, NULL);
if (tmp_ptr.pos == -1) {
    // negative result

    // quick, read directly from an internal field
    last_char = evbuffer_get_length (buf) - 1;
    if (incr_ptr.pos != -1)
        // requires second (after evbuffer_search_range) partial chain traversal
        evbuffer_ptr_set (buf, &incr_ptr, last_char - prev_last_char, EVBUFFER_PTR_ADD);
    else
        // requires second (after evbuffer_search_range) chain traversal
        evbuffer_ptr_set (buf, &incr_ptr, last_char, EVBUFFER_PTR_SET);
    prev_last_char = last_char;
} else {
    // a hit

    incr_ptr.pos = -1;
    // more hit code
    // ...
}
#v-

This should minimize the impact associated with multiple comparisons of
the same chunks. n checks of chunk 0, n - 1 checks of chunk 1, and so
on... n is the number of new data events until a hit occurs.

On the other hand this would cause one more (partial) traversal of the
chain every time data arrives. Probably still more efficient because
it's just hopping through a list and simple arithmetics compared to the
naive readln-like evbuffer_search() solution.

It also contains a bug of checking the last seen character twice.
Position after the last character is represented with chain == NULL, so
it would be unusable with EVBUFFER_PTR_ADD.

The code would also look a bit different if delimiter were longer than
one character.

Probably it would be perfect if an alternative evbuffer_search_range()
API existed with a non-const start/end evbuffer_ptr updated on every
call so one can continue the search from the last seen position.

Am I missing something?


[1] http://www.wangafu.net/~nickm/libevent-book/Ref7_evbuffer.html
[2] https://github.com/libevent/libevent/blob/a10a6f4ed918ea1432820d99e9373f37f906d6f0/buffer.c#L2708

-- 
Marcin Szewczyk
http://wodny.org
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users    in the body.