[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[Libevent-users] Incremental search
Hi,
is there anything that I am missing in the buffer interface that would
allow me to do an efficient incremental search when a lot of negative
results are returned on the way? I mean a case of parsing a stream
delimited with some string.
A typical example from the documentation[1]:
#v+
request_line = evbuffer_readln(buf, &len, EVBUFFER_EOL_CRLF);
if (!request_line) {
/* The first line has not arrived yet. */
} else {
if (!strncmp(request_line, "HTTP/1.0 ", 9)) {
/* HTTP 1.0 detected ... */
}
free(request_line);
}
#v-
Let's assume this code is in the bufferevent read callback.
Also, let's assume that in that callback I get a lot of negative results
-- data arrives often but it rarely contains the delimiter.
Now let's change that a bit -- I would like to delimit data with some
other character/string. I should probably use evbuffer_search_range().
If I see correctly in case of evbuffer_readln() the search always
happens from the beginning of the chunks chain.
In case of evbuffer_search_range() theoretically I can (optionally)
provide the start evbuffer_ptr[2] but I have a problem finding an
efficient way to update that pointer after every negative result.
The only concept I came up with would look something like this:
#v+
tmp_ptr = evbuffer_search_range (buf, "\t", 1, incr_ptr.pos != -1 ? &incr_ptr : NULL, NULL);
if (tmp_ptr.pos == -1) {
// negative result
// quick, read directly from an internal field
last_char = evbuffer_get_length (buf) - 1;
if (incr_ptr.pos != -1)
// requires second (after evbuffer_search_range) partial chain traversal
evbuffer_ptr_set (buf, &incr_ptr, last_char - prev_last_char, EVBUFFER_PTR_ADD);
else
// requires second (after evbuffer_search_range) chain traversal
evbuffer_ptr_set (buf, &incr_ptr, last_char, EVBUFFER_PTR_SET);
prev_last_char = last_char;
} else {
// a hit
incr_ptr.pos = -1;
// more hit code
// ...
}
#v-
This should minimize the impact associated with multiple comparisons of
the same chunks. n checks of chunk 0, n - 1 checks of chunk 1, and so
on... n is the number of new data events until a hit occurs.
On the other hand this would cause one more (partial) traversal of the
chain every time data arrives. Probably still more efficient because
it's just hopping through a list and simple arithmetics compared to the
naive readln-like evbuffer_search() solution.
It also contains a bug of checking the last seen character twice.
Position after the last character is represented with chain == NULL, so
it would be unusable with EVBUFFER_PTR_ADD.
The code would also look a bit different if delimiter were longer than
one character.
Probably it would be perfect if an alternative evbuffer_search_range()
API existed with a non-const start/end evbuffer_ptr updated on every
call so one can continue the search from the last seen position.
Am I missing something?
[1] http://www.wangafu.net/~nickm/libevent-book/Ref7_evbuffer.html
[2] https://github.com/libevent/libevent/blob/a10a6f4ed918ea1432820d99e9373f37f906d6f0/buffer.c#L2708
--
Marcin Szewczyk
http://wodny.org
***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxxxxxxx with
unsubscribe libevent-users in the body.