[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] Double-checking a couple questions about node churn rate



this is "gold" to me.

what i mean is i can do some stats analysis for the torproject if the dataset exists and there is a question defined.    this churn question is a nice example.

the datasets are identified
instructions exist to parse the data (willing to try)
delimeters or data tags are given (r lines p lines)
question clearly stated.

_to give more people access to the data_
a wishlist might include a general tool for parsing data and perhaps exporting to common formats (csv libreoffice spreadsheet etc).

you might be surprised with the breadth and pool of people who can help once they can easily import a dataset into their stats, matlab or spreadsheet package they happen to know. 

decent psychology programmes require a good understanding of stastical analysis for example.  many psych students out there.  i've seen some social psych profs give matlab homework for example.  regarding social networks.

i guess if you know how social networking data can be amassed and interpreted you can figure out how to obfuscate that data too.  just a thought. 

 i'd still like to see tools that put out garbage that look like data useful to marketing and sales but i digress.    maybe everone looking average within 0.5 SD can work.  since cookies were invented.  


_my skills are "menh"_ in the large scheme of things but i dont mind the work and i take it seriously.

multiple regression i can do fairly easily but multiple X and multiple Y ( eg factor analysis, principle components analysis, cluster, discriminant function analysis) would be a project that would take a good reference and study.  

i am willing but skills are old.  

i will try parsing some data on my own and see if i can "massage" the format to enter a stats package or two.  

if a dataset can enter open/libre/neo office spreadsheet file i can export from there to what i need.  CSV tab  variant or perhaps a user defined  delimiter method.  i'll have to check.  my s/w is old but useful.

as i don't know tor well enough to come up with questions of my own questions that help the torproject learn about itself in the wild like this interest me allot if they are helpful to the projects' needs.

best
steve


On Oct 7, 2014, at 10:04 PM, Karsten Loesing wrote:

> If you want to do some more analysis, fetch the latest consensus
> tarball(s) and write a script that compares contained fingerprints ("r"
> lines) for the churn question and exit policy summaries ("p" lines) for
> the exit-policy-change question:
> 
> https://collector.torproject.org/archive/relay-descriptors/consensuses/

-- 
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk