[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)

To: tor-dev@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)
From: Mike Perry <mikeperry@xxxxxxxxxxxxxx>
Date: Tue, 5 May 2015 23:38:24 -0700
Delivered-to: archiver@xxxxxxxx
Delivery-date: Wed, 06 May 2015 02:40:11 -0400
In-reply-to: <20150506043648.GO4317@xxxxxxxxxxxxxxxxxxxxx>
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <20150503043042.GA26428@xxxxxxxxxxxxxxxxxxxxx> <20150506012247.GK20018@xxxxxxxxxxxxxx> <20150506043648.GO4317@xxxxxxxxxxxxxxxxxxxxx>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>

isis:
> Mike Perry transcribed 5.1K bytes:
> > [â]
> >
> > 2. Perhaps cleaner: if BridgeDB itself were accessible through a domain
> > front, we could export its captcha and bridge distribution through an
> > API on this domain front. Once your IP forwarding in
> > https://trac.torproject.org/projects/tor/ticket/13171 is solved,
> > BridgeDB even could still make use of its IP-based hashring logic.
> 
> Maybe don't set the HTTP header name for the forwarded client IP to
> "X-Forwarded-For".  Otherwise, it will probably get overridden by the Apache
> server which acts as a reverse proxy in front of BridgeDB's Twisted servers.
> Just set it to something else, e.g. "X-Domain-Fronted-For".
> 
> Then, on the BridgeDB side, it's easy: I'd need to add logic to BridgeDB to
> handle preferring "X-Domain-Fronted-For", "X-Forwarded-For", then request IP,
> in that order.
> 
> > If we make use of this API in Tor Launcher (and we will, as soon as it
> > exists â I'd even pull a crazy and roll it out in the middle of a
> > stable, given the rapid rate of increase in these costs), users would
> > not need to know the magic incantations to access this front, and new
> > bridges could be obtained behind the scenes for them. All they would
> > have to do is keep solving captchas until something worked (until we
> > also implement some kind of fancy crypto like RBridge).
> 
> Perhaps the "BridgeDB API" part of what you want is the Tor Browser bridge
> distributor that I mentioned in Â3.1, SOW.9., in my Statement of Work [0] for
> OTF?

Yes, this is exactly what I want. With respect to SOW.9.1, consider it
feasible! Mission Accomplished! ;)

> Additionally, SOW.9. is actually the chronological precursor to SOW.10., the
> latter of which is implementing rBridge (or at least getting started on it).
> (Work on this is still waiting on OTF to officially grant me the fellowship,
> along with the other prerequisite tasks getting finished.)
> 
> But just to be clearâââsince it sounds like you've asked for several new
> things in that last paragraph :)âââwhich do you want:
> 
>   1. Tor Browser users use meek to get to BridgeDB, to get non-meek bridges by:
>        1.a. Retrieving and solving a CAPTCHA inside Tor Launcher.
>        1.b. Solving a CAPTCHA on a BridgeDB web page.
> 
>   2. Tor Browser users use BridgeDB's domain front, to get non-meek bridges by:
>        2.a. Retrieving and solving a CAPTCHA inside Tor Launcher.
>        2.b. Solving a CAPTCHA on a BridgeDB web page.
>
> If you want #2, then we're essentially transferring the domain-fronting costs
> (and the DDoS risks) from meek to BridgeDB, and we'd need to decide who is
> going to maintain that service, and who is going to pay for it.  Could The
> Tor Project fund BridgeDB domain fronting?

I proposed two things in my original email. My #1 is your #1.b. My #2 is
your #2.a.

For my #2 (your #2.a), what I want is a separate domain front for
BridgeDB. It makes the most sense to me for Tor to run its own domain
front for this.

If for some reason #2.a can't be done, we could do #1.a and use all of
meek+Tor, but this seems excessive, slow, and potentially confusing for
users (their Tor client would have to bootstrap twice for each bridge
set they test).

I only consider my #1 and #1.b emergency stopgaps, though. In fact, if
any aspect of this this process is too slow and/or confusing, we won't
take any load off of meek (unless the browser also starts regularly
yelling at meek users to donate or something).

> As far as maintenance goes, the threat to any of our domain fronts, including
> meek and any BridgeDB domain fronts, from China's Great Cannon waging economic
> counter-counter-warfare by attacking us (like they did to GreatFire.org) is
> something which must be taken into account.  Will the maintainer of this
> service need to wake up to emergency, the-request-rate-is-skyrocketing, emails
> at 4AM to shut the service down? 

I would love to hear how David deals with this risk since the Great
Cannon incident.

Honestly, though, I think this is less likely now. If China wasn't
somehow discouraged from this behavior via some diplomatic backchannel
or just general public backlash, GreatFire.org would probably still be
under attack right now.

Either way, it does seem wise to structure this such that multiple
people can respond to emergencies here, and that individuals like you
and/or David aren't on the hook for the financial damages.

> Or do we already have technical measures to detect DDoS and prevent
> $30,000+/day CDN bills?  Further, what happens when #2 is being
> DDoS-ed?  Should we fallback to #1?  Should we have both, and some
> strategy for balancing between the two?

I think trying to fall back or balance between the two is unlikely to
save us much, and will just introduce excessive implementation
complexity.

If they're going to attack domain fronting usage of Tor, it seems to me
that they will attack both meek and BridgeDB.

> > Now that we have a browser updater, I think it is also OK for us to
> > provide autoprobing options for Tor Launcher, so long as the user is
> > informed what this means before they select it, and it only happens
> > once.
> 
> Probing all of the different Pluggable Transport types simultaneously provides
> an excellent training classifier for DPI boxes to learn what new Pluggable
> Transport traffic looks like.
> 
> As long as it happens only once, and only uses the bridges bundled in Tor
> Browser, I don't see any issue with auto-selecting from the drop-down of
> transport methodnames in a predefined order.  It's what users do anyway.

Oh, yes. I am still against "connect to all of the things at the same
time." The probing I had in mind was to cycle through the transport list
and try each type, except also obtain the bridges for each type from
BridgeDB.

I also think we should be careful about the probing order. I want to
probe the most popular and resilient transports (such as obfs4) first.

> > The autoprobing could then keep asking for non-meek bridges for either a
> > given type of the user's choice, or optionally all non-meek types (with
> > an additional warning that this increases their risk of being discovered
> > as a Tor user).
> 
> If the autoprobing is going to include asking BridgeDB (multiple times?) for
> different types of bridges in the process, whether through a BridgeDB domain
> front or not, then I think there needs to be more discussionâ
> 
>   * Do you think could you explain more about the steps this autoprobing
>     entails?

1. User starts a fresh Tor Browser (or one that fails to bootstrap)
2. User clicks "Configure" instead of "Connect"
3. User says they are censored
4. User selects a third radio button on the bridge dialog
   "Please help me obtain bridges".
5. Tor Browser launches a JSON-RPC request to BridgeDB's domain front
   for bridges of type $TYPE
6. BridgeDB responds with a Captcha
7. User solves captcha; response is posted back to BridgeDB.
8. BridgeDB response with bridges (or a captcha error)
9. Tor Launcher attempts to bootstrap with these bridges.
10. If bootstrap fails, goto step 5.

The number of loops for steps 5-10 for each $TYPE probably require some
intuition on how frequently we expect bridges that we hand out to be
blocked due to scraping, and how many bridge addresses we really want to
hand out per Captcha+IP address combination.

Later, we can replace Captchas with future RBridge-style crypto, though
we should design the domain front independently from RBridge, IMO.

>   * Is the autoprobing meant to solve the issue of not knowing which transport
>     will work?  Or the problem of not knowing whether the bridges in Tor
>     Browser are already blocked?  Or some other problem?

Both problems at once, though I suspect (or at least hope) that the
current transport types included with Tor Browser are more likely to be
blocked by scraping BridgeDB for IP addresses than by DPI.

If we're shipping transports known to be blocked by DPI, we should be
phasing them out of Tor Browser, and definitely not using them for this
autoprobing business.
 
>   * Does BridgeDB continue to always normally answer with one transport
>     methodname at a time, unless the "russianroulette" meta-transport type is
>     requested?

Yes, only one transport should be tested at a time, to avoid the
possibility of bad transports revealing the IP addresses of the good
ones by testing them in combination.
 
> If we follow BridgeDB's spec, [1] and we allow wish for the logic controlling
> how Tor Browser users are handled to be separate (and thus more maintainable),
> then this will require a new bridge Distributor, and we should probably start
> thinking about the threat model/security requirements, and behaviours, of the
> new Distributor.  Some design questions we'll need to answer include:
> 
>   * Should all points on the Distributor's hashring be reachable at a given
>     time (i.e., should there be some feasible way, at any given point in time,
>     to receive any and every Bridge allocated to the Distributor)?
>
>   * Or should the Distributor's hashring rotate per time period?  Or should it
>     have sub-hashrings which rotate in and out of commission?
> 
>   * Should it attempt to balance the distribution of clients to Bridges, so
>     that a (few) Bridge(s) at a time aren't hit with tons of new clients?
> 
>   * Should it treat users coming from the domain front as separate from those
>     coming from elsewhere?  (Is is even possible for clients to come from
>     elsewhere?  Can clients use Tor to reach this distributor?  Can Tor
>     Browser connect directly to BridgeDB, not through the domain front?)
> 
>   * If we're going to do autoprobing, should it still give out a maximum of
>     three Bridges per request?  More?  Less?

Personally, I think the domain fronting distributor should behave
identically to the closest equivalent distributor that isn't domain
fronted, both to reduce implementation complexity, and to keep the
system easy to reason about.

Before RBridge is implemented, this would mean using the
X-Domain-Fronted-For header's IP address as if it were the real IP
address, and index into the hashrings in the same way as we do with the
web distributor.

I could see an argument that the set of bridges held by the domain
fronting distributor should be kept separate from the web distributor,
because heck, way more people should be able to access the domain
fronted version, and maybe we want to drastically reduce the web
distributor's pool because nobody can reach it (except for whitelisted
scrapers and people who don't really need bridges).

However, if you do keep the domain front pool separate from the web
distributor pool, you should ensure that you also properly handle the
case where Tor IP addresses appear in the X-Domain-Fronted-For header.
Again, for this case, I think the simplest answer is "use the same rules
as the current web distributor does", though if the domain front pool is
separate, perhaps the Tor fraction should be much smaller.

> > Would you and/or Isis be able to work on this on the backend? If not,
> > can either of you recommend someone that might be able to help with the
> > domain fronting bits and other bits involved?
> 
> I'm in.  Yawning mentioned wanting to work on this too. :)

Great!

 
> [0]: https://people.torproject.org/~isis/otf-etfp-sow.pdf#subsection.3.1
> [1]: https://gitweb.torproject.org/torspec.git/tree/bridgedb-spec.txt


-- 
Mike Perry

Attachment: signature.asc
Description: Digital signature

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Follow-Ups:
- Re: [tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)
  - From: isis

References:
- [tor-dev] Summary of meek's costs, April 2015
  - From: David Fifield
- Re: [tor-dev] Summary of meek's costs, April 2015
  - From: Mike Perry
- Re: [tor-dev] Summary of meek's costs, April 2015
  - From: isis

Prev by Author: Re: [tor-dev] Summary of meek's costs, April 2015
Next by Author: Re: [tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)
Previous by thread: Re: [tor-dev] Summary of meek's costs, April 2015
Next by thread: Re: [tor-dev] Brainstorming Domain Fronted Bridge Distribution (was meek costs)
Index(es):
- Author
- Thread