[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-bugs] #2687 [Torperf]: Write Python version of filter.R to parse Torperf's new .mergedata format (was: Update filter.R to parse Torperf's new .mergedata format)



#2687: Write Python version of filter.R to parse Torperf's new .mergedata format
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:          
     Type:  enhancement  |         Status:  assigned
 Priority:  major        |      Milestone:          
Component:  Torperf      |        Version:          
 Keywords:               |         Parent:          
   Points:  4            |   Actualpoints:          
-------------------------+--------------------------------------------------
Changes (by karsten):

  * owner:  karsten =>
  * status:  needs_review => assigned


Comment:

 Replying to [comment:18 tomb]:
 > Tentative conclusion:  R is ill suited to significant string
 manipulation
 > Tentative recommendation: Let R crunch numbers and stats, but do the
 string manipulation in a different language.

 Okay.  I didn't expect R to be incapable of handling this data format,
 because R is really fast at parsing CSV files, tables, and so on.  But I
 agree with you.  Let's stop trying to use R for this task.

 > Why not move the string manipulation into the programs that provide the
 .data and .mergedata?

 You mean why not produce both the .mergedata format and another format
 that R can handle more easily?  Why would we need the .mergedata format
 then?  We should agree on a single data format that describes Torperf
 data.

 If we find another format that R can handle more easily, we should only
 use that format.  But we want to make sure that the data format can be
 extended easily.  For example, if we want to add another parameter like
 CBT, we want to do that without breaking old stuff.  Or we might want to
 have some fields show up in only some of the measurements, like hidden
 service substeps, but without writing NA for them in all other
 measurements.  And we want to be able to remove fields that we don't need
 anymore.  The key=value formats seems more flexible than CSV here.  See
 #3036 for the Torperf data format discussion.

 So, what can we do about this ticket?  How about we rewrite filter.R in
 Python?  The rest of Torperf is written in Python, so that we can expect
 people to have that available.  I'm changing the ticket summary
 accordingly.  If you disagree about Python or rewriting filter.R in it, we
 can always change it to something else.

 Thanks!

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2687#comment:19>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs