Hi Pierre, Thanks for the very well thought proposal! I'm curious about your ideas on the "returning device problem." EFF's Panopticlick and AmIUnique.org use a combination of cookies and IP address to recognize returning users - so that their fingerprints are not "double-counted." Since these signals will not available anymore (unless the user opt-ins to retain the cookie), I wonder what'd be your ideas to address this issue. Please find other responses below. Best, Gunes On 2016-03-15 04:46, Pierre Laperdrix wrote: > Hi Tor Community, > > My name is Pierre and I'm really interested in participating in a GSoC > project this year with the Tor organization. Since I've been working on > browser fingerprinting for the past two years, I'd love to build a > Panopticlick-like website to improve the fingerprinting defenses of the > Tor browser. > > I've included below my proposal in case anyone has ideas or suggestions, > especially on the technical section or on some of the open questions > that I have. (It should be noted that the Torprinter name is subject to > change). > > > ****************************************************** > > Summary - The Torprinter project: a browser fingerprinting website to > improve Tor fingerprinting defenses > The capabilities of browser fingerprinting as a tool to track users > online has been demonstrated by Panopticlick and other research papers > since 2010. The Tor community is fully aware of the problem and the Tor > browser has been modified to follow the "one fingerprint for all" > approach. Spoofing HTTP headers, removing plugins, including bundled > fonts, preventing canvas image extraction: these are a few examples of > the progress made by Tor developers to protect their users against such > threat. However, due to the constant evolution of the web and its > underlying technologies, it has become a true challenge to always stay > ahead of the latest fingerprinting techniques. > I'm deeply interested in privacy and I've been studying browser > fingerprinting for the past 2 years. I've launched 18 months ago the > AmIUnique.org website to investigate the latest fingerprinting > techniques. Collecting data on thousands of devices is one of the keys > to understand and counter the fingerprinting problem. > For this Google Summer of Code project, I propose to develop the > Torprinter website that will run a fingerprinting test suite and collect > data from Tor browsers to help developers design and test new defenses > against browser fingerprinting. The website will be similar to AmIUnique > or Panopticlick for users where they will get a complete summary with > statistics after the test suite has been executed. It can be used to > test new fingerprinting protection as well as making sure that > fingerprinting-related bugs were correctly fixed with specific > regression tests. The expected long-term impact of this project is to > reduce the differences between Tor users and reinforce their privacy and > anonymity online. In a second step, the website could open its doors to > more browsers so that it could become a platform where vendors can > implement significant changes in their browsers with regards to privacy > and see the impact first-hand on the website. With the strong expertise > I have acquired on the fingerprinting subject and the experience I have > gained by developing the AmIUnique website, I believe I'm fully > qualified to see such a project through to completion. > > Website features > The main feature of the website is to collect a set of fingerprintable > attributes on the client and calculate the distribution of values for > each attribute like Panopticlick or AmIUnique. The set of tests would > not only include known fingerprinting techniques but also ones developed > specifically for the Tor browser. > The second main feature of the website would be for Tor users to check > how close their current fingerprint is from the ideal unique fingerprint > that most users should share. A list of actions should be added to help > users configure their browser to reach this ideal fingerprint. > The third main feature would be an API for automated tests as detailed > by this page : > https://people.torproject.org/~boklm/automation/tor-automation-proposals.html#helper-fingerprint > . This would enable automatic verification of Tor protection features > with regard to fingerprinting. When a new version is released, the > output of specific tests will be verified to check for any > evolution/changes/regressions from previous versions. > The fourth main feature I'd like to include is a complete stats page > where the user can go through every attribute and filter by OS, browser > version and more. > The inclusion of additional features that go beyond the core > functionnalities of the site should be driven by the needs of the > developers and the Tor community. > Still, a lot of open questions remain that should be addressed during > the bonding period to define precisely how each of these features should > ultimately work. > Some of these open questions include: > - How closed/private/transparent should the website be about its tests > and the results? Should every tests be clearly indicated on the webpage > with their own description? or should some tests stay hidden to prevent > spreading usable tests to fingerprint Tor users? I think the site should be transparent about the tests it runs. Perhaps the majority of the fingerprinting tests/code will run on the client side and can be easily captured by anyone with necessary skills (even if you obfuscate them). > - Should a statistics page exist? Should we give a read access to the > database to every user (like in the form of a REST API or other solutions)? I think aggregate statistics should be available publicly but exposing individual fingerprints publicly may not be necessary. > - Where the data should be stored? How long should the data be kept? If > tests are performed by versions, should the data from an old TBB version > be removed? Should the data be kept a week, a month or more? > - How new tests should be added: A pull request? A form where > submissions are reviewed by admins? A link to the Tor tracker? > - Should the website only be accessible through Tor? > > Technical choices > In my opinion, the website must be accessible and modular. It should > have the ability to cope with an important number of connections/data. > With this in mind and the experience gained from developing AmIUnique, I > plan on using the Play framework with a MongoDB database. Developing the > website in Java opens the door to many developers to make the website > better and more robust after its initial launch since it is one of most > used programming language in the world. On the storage and statistics > side, MongoDB is a good fit because it is now a mature technology that > can scale well with an important number of data and connections. > Moreover, the use of SQL databases for AmIUnique proved to be really > powerful but the maintenance after the website was launched became a > tedious task, especially when modifying the underlying model of a > fingerprint to collect new attributes. The choice of a more flexible and > modular database seems a better choice for maintenance and for > adding/removing tests. > > > > Estimated timeline > You will find below a rough estimate of the timeline for the three > months of the GSoC. > > Community bonding period - Discuss with the mentors and the community > the set of features that should be included in the very first version of > the website and clarify the open questions raised in one of the previous > paragraphs. > > 23 May - 27 June : Development of the first version of the website with > the core features > Week 1 - Development of the first version of the fingerprinting > script with the core set of attributes. Special attention will be given > so that it is fully compatible with the most recent version of the Tor > browser (and older ones too). > Week 2 - Start developing the front-end and the back-end to store > fingerprints with a page containing data on your current fingerprint > (try adding a view to see how close/far you are from the ideal fingerprint). > Week 3 - Start developing the statistics page with the necessary > visualization for the users. Modification of the back-end to improve > statistics computation to lessen the server load. > Week 4 - Finishing the front-end development and refining the > statistics page to get back the most relevant information. > Adding and testing an API to support automated tests. > Week 5 - Finishing the first version so that it is ready for deployment. > Start developing additional features requested by the community > (rest API? account management?) > > 27 June - Mid July : > Deployment of the first version online for a beta-test with bug fixing. > Finishing development of additional features requested by the > mentors/community. > Defining the list of new features for the second version. > > Mid July - 23th August : > Adding a system to make the website as flexible as possible to > add/remove tests easily (A pull-request system? A test submission form > where admins review tests before they are included in the test suite?) > Developing additional features for the website. > Making sure that the website can be opened to more browsers (work done > at design time to support any browsers will be tested here) > Bug fixing > > > Code sample > In 2014, I developed the entire AmIUnique.org website from scratch. Its > aim is to collect fingerprints to study the current diversity of > fingerprints on the Internet while providing full details to users on > this subject. It was the first time that I built a complete website from > the design phase to its deployment online. > One of the first challenge that I encountered was to build a script that > would not only use state-of-the-art techniques but that could simply > work on the widest variety of browsers. Testing a script for a recent > version of a major browser like Chrome and Firefox is an easy task since > they implement the latest HTML and JavaScript technologies but making > sure that the script runs correctly on older browsers like Internet > Explorer is another story. Juggling with a dozen different virtual > machines was necessary to obtain a bug-free and stable version of the > script. A small beta-test was required to make sure that everything was > good to go for what is now the foundations of the AmIUnique website. The > totality of the source code for AmIUnique and my other projects can be > found on GitHub. > A second challenge that I faced was to deal with the increasing load of > users so that the server could return personalized statistics to > visitors in a timely manner (less than 2/3s). By having a separate > entity that updates statistics in real time on top of the database, I > managed to drastically reduce the server load. With the number of Tor > users around the world, the website needs from the get go to handle a > high load of visitors and statistics computation and my previous > experience on that specific task will prove useful. > > For the very first version of Torprinter, I plan on testing well-known > and widespread fingerprinting techniques to make sure that there is no > variation among Tor users. These include HTTP headers and known > JavaScript objects. There should be no need for any Flash attributes > since plugins are not present in the Tor browser (thus removing complex > code in charge of correctly loading the Flash object). > For this proposal, I have also developed a special page with 7 different > tests that are mainly targeted at the Tor browser to give an idea of > what tests can be included that are more suited to the Tor users. > Tests n°5, n°6 and n°7 are broader and also concerns the Firefox browser. > You can found a working version of the script on a special webpage (need > to scroll to make the results appear): > https://plaperdr.github.io/torScript.html > The script can be found here: https://plaperdr.github.io/assets/tor/tor.js > > Test n°1 > Test the size of the current window - As reported by ticket n°14098 > https://trac.torproject.org/projects/tor/ticket/14098 > Test n°2 > Test the support of emoji - As reported by ticket n°18172 > https://trac.torproject.org/projects/tor/ticket/18172 > Test n°3 > Analysis of the "scroll" behavior of the window - As investiagted by > http://jcarlosnorte.com/security/2016/03/06/advanced-tor-browser-fingerprinting.html > Test n°4 > Test the size of current fallback font by using the canvas API to render > some text (no need for user permission like canvas extraction) - Custom test > Test n°5 > Test the difference between OS on the maximum font size - Custom test > Test n°6 > Test the difference between OS on the Date API - As reported by ticket > n°15473 https://trac.torproject.org/projects/tor/ticket/15473 > Test n°7 > Test the difference between OS on the Math class - As reported by ticket > n° 13018 https://trac.torproject.org/projects/tor/ticket/13018 > > ****************************************************** > > Any remarks, suggestions or ideas are very welcome! > Pierre > > > > _______________________________________________ > tor-dev mailing list > tor-dev@xxxxxxxxxxxxxxxxxxxx > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev >
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev