[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] High-latency hidden services



According to your description you intend to reconsitute the page removing eventually what can be dangerous, this is very difficult to do (assuming that you want this page to behave like a real one and not like opening something similar to offline/mypage.html from your disk and assuming that you want to use the browsers as they are, ie not using plugins/extensions and not hacking into the code), I have described how it can be done in [1]

But finally, if the interesting information are some resources to be fetched from this page, then [2] does apply and is from far much more easy to do.

You can look at [3] to [6] which are projects to fetch/parse a page on server side (headless browser, handling js too) and extract things from it, the same principles apply on browser side for what people want to do here, when the fecthing is coupled to [7] it does provide anonymity, whether on browser or server side.

[1] https://lists.torproject.org/pipermail/tor-talk/2014-July/033636.html
[2] https://lists.torproject.org/pipermail/tor-talk/2014-July/033697.html
[3] https://github.com/Ayms/node-dom
[4] https://github.com/Ayms/node-bot
[5] https://github.com/Ayms/node-gadgets
[6] https://github.com/Ayms/node-googleSearch
[7] https://github.com/Ayms/node-Tor

Le 08/07/2014 22:21, The Doctor a écrit :
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 07/03/2014 03:16 PM, Seth David Schoen wrote:

That's great, but in the context of this thread I would want to
imagine a future-generation version that does a much better job of
hiding who is downloading which pages -- by high-latency mixing,
like an anonymous remailer chain.
I realized that too late; thank you for pointing that out.

I've been thinking a bit about this lately, and I think it might be
doable.

A while back I chanced across a description of how Richard Stallman
browses the Net much of the time.  He uses a Perl script which is
executed by Postfix via an e-mail alias.  If the sender's e-mail
address matches one hardcoded in the config file, it parses the e-mail
for URLs to grab and then uses LWP::UserAgent to download the URL and
e-mail it back to the script's owner.

The Git repo with the implementation:

git://git.gnu.org/womb/hacks.git

So... I've been toying with this idea but haven't had time to sit down
and implement it yet:

It would be possible to write a relatively simple utility that runs as
a hidden service; perhaps on the user's (virtual) machine, perhaps on
a known Tor hidden service node.  Perhaps it doesn't use a hidden
service for itself but only listens on the loopback interface on a
high port, and the user connects to http://localhost:9393/ from within
the TBB.  Perhaps any of those options, dependent upon a command line
switch or configuration file setting.  The user connects to the
application and types or pastes a URL into a field.  The utility
accepts the URL, verifies that it's a well formed URL, and records it
internally, perhaps in a queue.  Every once in a while on a
pseudorandom basis (computers, 'true' randomness, we've all seen the
mailing list threads) the utility wakes up, picks the oldest URL in
its queue out, and tries to download whatever it points to through the
Tor network.

If it successfully acquires an HTML page it could then attempt to
parse it (using something like Beautiful Soup, maybe) to verify that
it was a fully downloaded and validated HTML page.  It would also pick
through the parsed tags for things like CSS or images, construct URLs
to download them using the original URL (if no full URLs to them are
in the HTML), and add them to the queue of things to get.  It doesn't
seem unreasonable to rewrite the HTML to make links to those
additional resources local instead of remote (./css/foo.css instead of
css/foo.css) so the additional files downloaded would be referenced by
the browser.  It also doesn't seem unreasonable that a particular
instance of this utility could be configured to ignore certain kinds
of resources (no .js files, no images, no CSS files) and snip tags
that reference them from the HTML entirely.  When the resources for
the page in question are fully downloaded (none are left in the queue)
the user is alerted somehow (which suggests a personal application but
there are other ways of notifying users).

The timeframe in which an entire page could be downloaded could be
extremely long, from seconds between requests, to requiring a new
circuit for each request, to even weeks or months to grab an entire page.

I don't know if such a thing could be written as a distributed
application (lots of instances of this utility spread across a
percentage of the Tor network keeping each other appraised of bits and
pieces of web pages to download and send someplace).  I'll admit that
I've never tried to write such a thing before.  The security profile
of such a thing would certainly be a concern.

Representing each page and its resources in memory would take a little
doing but is far from impossible.  Depending on the user's threat
model it may not be desirable to cache the page+resources on disk
(holding them in RAM but making them accessible to the web browser,
say, with a simple HTTP server listening on the loopback on a high
port (I'm thinking instead of http://localhost:9393/ the user would
access http://localhost:9393/pages/foo)), or the user may be
comfortable with creating a subdirectory to hold the resources of a
single page.  This is the technique that Scrapbook uses, and being
workable aside seems very easy to implement:

~/.mozilla/firefox/<profile name>/Scrapbook/data/<datestamp>/<web page
and all resources required to view it stored here in a single directory>

A problem that would probably arise is Tor circuits dropping at odd
intervals due to the phase of the moon, Oglogoth grinding its teeth,
sunspots, or whatever and the connection timing out or dropping.  I'm
not sure how to handle this yet.  Another potential problem is a user
browsing a slowly downloaded page and clicking a link, which the
browser would then jump directly to and avoiding slow-download
entirely.  Warn the user this will happen?  Rewrite or remove the
links?  I'm not sure yet what the Right Thing To Do(tm) would be.
There are undoubtedly other gotchas that I haven't thought of or run
into yet which others will notice immediately.

- -- The Doctor [412/724/301/703] [ZS]
Developer, Project Byzantium: http://project-byzantium.org/

PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F  DD89 3BD8 FF2B 807B 17C1
WWW: https://drwho.virtadpt.net/

"So many paths lead to tomorrow/what love has lost, can you forgive?"
- --The Cruxshadows

-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJTvFLOAAoJED1np1pUQ8RkBxQP/0dVnZ5roC4S5ErbK30FFKSt
VfACPXeO8SKbjyLQxU5RYVS0Q/nwS8QgSr/cCc7tqjtmkGypolJFsFIeKLc4m1bG
93H7FWoSDO++oiWyqxYBb7+q6CzktAGFesb0YFUtIe5ADKTVIqcynWD++6NByN0v
rPpd5awppjL2f8r4l+bNBRWpk2d7KpilAG6KAwsQyDyvmLJWw9Pr2yN0o6SrrpHl
hxK7jti6HAt2pMuFxbl3mI3MMN717XDymE04CLkGopiprhF0YAk7K0tPEak69e7/
OD2AI2O0nLZzWZUdX9zrYs9OICuVfzVf0XUeTNKogh/30UBw3KdqNNPcVFbajnEe
YBeAe6iNnDIgE70nv4OiIuFL9XO4rLmNOvCB9F3mRqIJVl/8mq7WTQeVBGt+dmz3
7srHnR1nenCmTHnyfaKYtn4+N0TGhdXHLR3e/4+v4RmU+Zueo/wf4ggM0ZUlfUVk
JRPhfdDe9bx3O+nYIObPd5V0/atAHuXJjh9SJasOaxnQjiye75wZuswRa3vR5pdz
Sd4vSWi6eQ+9s4xlD+dy5309yxQTDyFbr/O7lshtLPx51PC8ObU2+NMJMQY2HhyU
pgLL6Gj6W2evwJAsS7pqQZNLzV3XdJwgNka2zO1XVES5/odKlvuItXDCC4g5ftxI
75FDhASj5VzcZjOJqcAc
=LLtg
-----END PGP SIGNATURE-----

--
Peersm : http://www.peersm.com
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms

--
tor-talk mailing list - tor-talk@xxxxxxxxxxxxxxxxxxxx
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk