[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #6180 [Ooni]: Detecting censorship in HTTP pages
#6180: Detecting censorship in HTTP pages
----------------------------+-----------------------------------------------
Reporter: hellais | Owner: hellais
Type: task | Status: needs_review
Priority: normal | Milestone:
Component: Ooni | Version:
Keywords: SponsorH201206 | Parent:
Points: | Actualpoints:
----------------------------+-----------------------------------------------
Comment(by isis):
> We also talked about having clients tell the backend what it got as a
response and having the
> backend figure out if such a page should be a block page or the correct
result.
This is similar to what Bismark does: they have the client test node call
back to a server through an ssh tunnel, and login to a restricted shell
where it sets up a recovery tunnel and does a mysqldump. There is was also
a script to email the person whose router is running the tests if no
updates had been made in a while.
Obviously we'd need to deal with several privacy issues, but if we wind up
being allowed to run HSs on Mlab nodes, then we could possibly have the
HTTP comparison done through that.
I have done a bit of research into support vector machines and of course
have studied bayesian inference, but I'm not a machine learning expert. I
do know from the experience of spending two years training a lexigraphical
fully-recurrent backpropagating neural network that training is about as
much fun as punching yourself in the face. And, though I have not worked
with them, and it is also a fast-progressing field, I believe that SVMs
have trouble with fitting when the training and data sets are large
because the radius function thing (I forget what that function is called)
doesn't center on the data point correctly. There is also another thing
which is much much simpler and easier to train, called a Relevant Vector
Machine, which is basically just the covariance between the training and
experimental sets, applied against a Gaussian distribution over a
multidimensional space which represents "the test field", which is where
defining the test field in an optimized fashion leads to the kernel trick.
I do not know. I think if there exists a feasible machine learning
algorithm for computing if a page is changed (if that even happens), or
giving us a regex set for what the blocks are, then the censors would use
it to find the pages.
That said, I looked into libraries for hacking on this. There is a thing
called OrangePy which looks pretty good, and I've played with PyBrain
before and it was too bad.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6180#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs