[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #2687 [Torperf]: Update filter.R to parse Torperf's new .mergedata format
#2687: Update filter.R to parse Torperf's new .mergedata format
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_review
Priority: major | Milestone:
Component: Torperf | Version:
Keywords: | Parent:
Points: 4 | Actualpoints:
-------------------------+--------------------------------------------------
Comment(by rransom):
Replying to [comment:11 karsten]:
> Pasting your email and replying to it here:
>
> > What I am saying is, maybe R is just the wrong tool for the really
> > string heavy stuff. I could write a small parser in c, using lex and
> > yacc so that the parser can be an efficient state machine. This
> > parser could then be called from an R script. The parser does the
> > front end string processing and can dump it into the csv. We then
> > read the csv into the R code to crunch the stats.
> >
> > Seems like this would use the best features of each tool. I can
> > certainly make my current R approach output to csv, that is just a few
> > lines of code at the bottom. I was focusing on testing the data
> > structure before producing text output.
Yes, it would be a good approach if R really couldn't parse your file
efficiently.
> > Since the input language is so simple and has a regular level grammar
> > the state machine will be super efficient since there is no need for a
> > lookahead or LR parsing the way there would be with a context free
> > grammar. The advantage is that the state machine would be run byte by
> > byte over the input in a single pass. Very low memory requirement
> > since you only need to buffer on an as needed basis. You don't have
> > to read in the characters as a large matrix which may be what R does.
''Not'' likely. It looks very much like R just reads in a line at a time
using whatever buffering stdio provides.
> > I don't know how many lines R buffers at once, but with lex and yacc
> > you know the buffer is a small constant size. That way we really know
> > that our O(n) single pass through the text doesn't have any hidden
> > side costs.
It's not a single pass through the text. Each time you process an input
line, you copy all of the preceding lines:
{{{
117 mergedata_vector <- c(mergedata_vector, my.mergedata)
}}}
That's O(n^2^) right there.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2687#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs