[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[or-cvs] r15270: Minor changes in protocol description and internal view meas (projects/hidserv/trunk/doc)
Author: kloesing
Date: 2008-06-15 08:35:32 -0400 (Sun, 15 Jun 2008)
New Revision: 15270
Modified:
   projects/hidserv/trunk/doc/report.pdf
   projects/hidserv/trunk/doc/report.tex
Log:
Minor changes in protocol description and internal view measurements. Added a bunch of TODOs.
Modified: projects/hidserv/trunk/doc/report.pdf
===================================================================
(Binary files differ)
Modified: projects/hidserv/trunk/doc/report.tex
===================================================================
--- projects/hidserv/trunk/doc/report.tex	2008-06-15 10:11:17 UTC (rev 15269)
+++ projects/hidserv/trunk/doc/report.tex	2008-06-15 12:35:32 UTC (rev 15270)
@@ -61,50 +61,47 @@
 
 \section{Hidden Service Protocol}
 
-%% \emph{TODO Christian: Your protocol description might also help for
-%% understanding service publication, so it might be useful to have it here in
-%% a separate section.}
-This section describes the protocol for establishing and accessing hidden
-services in detail. An overview is shown in Figure~\ref{fig:hs_overview}.
+Before going into the details of measuring performance of setting up and
+accessing a hidden service, a brief description of the protocol is given.
+This is required for understanding the timings of internal processes and
+explaining possible delays. Figure~\ref{fig:hs_overview} shows an overview
+of exchanged messages.
 
 Consider a user called Bob who wants to offer a location-hidden service.
 All Bob needs to do to establish a hidden service is to set up the actual
-service to start the Tor client, which is configured to provide a hidden service.
-
+service, e.g.\ a website, and start a Tor client, which is configured to provide a hidden service.
+%
 At startup the Tor client builds circuits to three randomly chosen Tor relays,
-which will act as \emph{introduction point} for the service.
-As soon as the circuit is built the hidden server sends an 
+which will act as \emph{introduction points} for the service.
+As soon as a circuit is built the hidden server sends an 
 \texttt{ESTABLISH\_INTRO} cell over the circuit, containing the public key of
-the hidden server. The introduction point associates the circuit with the 
-public key received in the cell. 
-
+the hidden server.
+%
 \begin{figure}
 \centering
-\includegraphics[width=0.8\textwidth]{hs_overview.png}
+\includegraphics[width=0.8\textwidth]{hs_overview.png}\\
+\emph{TODO Christian: The cell names are hardly readable on screen. Can
+you try making them bold or use the font that you used in Fetch RSD?}
 \caption{Overview of hidden service establishment and access}
 \label{fig:hs_overview}
 \end{figure}
-
-During the hidden service access, clients will contact the introduction point and 
-the hidden server's public key is used to find the right circuit, especially
-if the relay acts as rendezvous point for more than one hidden service.
-After accepting its role, the introduction points acknowledges by sending an an 
-\texttt{INTRO\_ESTABLISHED} cell over the circuit. 
-After receiving the acknowledgments the hidden server builds the 
+%
+The introduction point acknowledges by sending an 
+\texttt{INTRO\_ESTABLISHED} cell.
+After receiving acknowledgments the hidden server builds the 
 \emph{rendezvous service descriptor}. This descriptor includes contact 
-information about the three introduction points and the hidden server's public 
-key. The hidden server uploads the descriptor and a unique identifier for the 
-service, the onion address built from a random hash, to the hidden service
+information of the established introduction points and the hidden service's public 
+key. The hidden server uploads the descriptor and the hash value of the
+public key as a unique service identifier, the onion address, to the hidden service
 directory.
-% XXX actually, not random. based on public key, so we can self-sign -RD
 
 If a user called Alice wants to access a hidden service, she must have learned 
 the onion address before out-of-band.
-First the Tor client needs to know how to contact the hidden server at all. 
-To retrieve the rendezvous service descriptor including the contact information
+First the Tor client needs to know the rendezvous service descriptor to contact the hidden server. 
+To retrieve the descriptor including the contact information
 of the introduction points the client sends a request to 
 the hidden service directory, using an anonymous 3-hop circuit.
-
+%
 After receiving the rendezvous service descriptor the Tor client randomly picks 
 one of the introduction points it finds in the descriptor and builds a circuit
 to the introduction point. To accelerate the circuit building, so-called 
@@ -122,8 +119,7 @@
 over the circuit. This cell contains a random rendezvous cookie for 
 identification purpose. The rendezvous point saves the rendezvous cookie and 
 acknowledges its new 
-functionality by replying with a \texttt{RENDEZVOUS\_ESTABLISHED} cell on the 
-same circuit.
+functionality by replying with a \texttt{RENDEZVOUS\_ESTABLISHED} cell.
 
 If the introduction circuit is successfully opened and the rendezvous point has 
 replied with the acknowledgment cell, 
@@ -142,11 +138,11 @@
 this case an existing 3-hop circuit is extended to the rendezvous point. When
 the circuit is open, a \texttt{RENDEZVOUS1} cell is sent down the circuit, 
 including the rendezvous cookie.
-
+%
 Upon receiving the \texttt{RENDEZVOUS1} cell the rendezvous point uses the 
 rendezvous cookie to find the matching client circuit and connects it with the 
 one it just received the cell over. At last the rendezvous point sends a 
-\texttt{RENDEZVOUS2} cell back to the client to notify it that a connection to
+\texttt{RENDEZVOUS2} cell to the client to notify it that a connection to
 the hidden service has been successfully established and is now ready to transfer
 user data.
 
@@ -313,19 +309,8 @@
 \subsection{Internal View Measurements}
 \label{sec:internalview}
 
-%\emph{TODO Christian: Describe setup here, possibly re-use parts of diploma
-%thesis.}
-
-%% In order to measure the substep of accessing a hidden service, log events are 
-%% used, which are generated by Tor during runtime. To have acess to the log files
-%% of the most important roles in the process of hidden service access, a 
-%% measurement scenario is set up, hosting these roles on own computers, but using
-%% public Tor relays for circuit building to make the scenario more realistic in 
-%% terms of latency, compared to measurements in a private local area network.
-
-In order to measure the multiple substeps necessary to access a hidden service,
-including message transfer times, a second
-measurement environment is set up. Tor clients generate log statements during
+A second
+measurement environment is set up to measure the multiple substeps necessary to access a hidden service. Tor clients generate log statements during
 runtime, which are used to gain insight, when certain internal events occur.
 To have access to all log files needed, the most important roles in the 
 process of connection establishment are set up especially for the measurements.
@@ -347,6 +332,10 @@
 which try to access the hidden service 75 seconds after their launch to let 
 them build circuits previously.
 
+\emph{TODO Christian: Mention bug here that selecting a rendezvous by means
+of configuring it in torrc fails. This bug was probably introduced with the
+config option for RendNodes, right? When was this?}
+
 %% The following paragraphs describe the measured substeps in detail and which 
 %% log statements are used to determine the values. An overview of all steps 
 %% is shown in Figure \ref{fig:substeps}. Messages are indicated by solid lines,
@@ -533,28 +522,27 @@
 
 \section{Results}
 
-The \emph{external view} measurements were performed between June 1, 2:50pm
+The \emph{external view} measurements were performed using Tor version
+0.2.1.0-alpha-dev (r14739) between June 1, 2:50pm
 (starting time of first test run) and June 5, 9:35am (starting time of last
 test run), resulting in a total of 1,090 data samples. A tarball with all
 log files is available
-online.\footnote{\url{http://freehaven.net/~karsten/hidserv/test-env.tar.gz}}
-% Put dates in the name or something, or you're going to be forever
-% regretting your choice of filename. :) -RD
+online.\footnote{\url{http://freehaven.net/~karsten/hidserv/perfdata-2008-06-01.tar.gz}}
 
 During evaluation it has turned out that there is a bug in Tor that leads
 to a delay in service publication (see below for details). This made it
 necessary to perform a second set of measurements between June 12, 9:22pm
-and Jun 13 3:31pm with a new test every single minute instead of every five
-minutes. The resulting 1,090 data samples are also available for
-download.\footnote{\url{http://freehaven.net/~karsten/hidserv/test-env2.tar.gz}}
+and June 13, 3:31pm with Tor version 0.2.1.0-alpha-dev (r15153). This time a
+new test was started every single minute instead of every five minutes and
+tests were aborted after uploading the first hidden service descriptor. The
+resulting 1,090 data samples are also available for
+download.\footnote{\url{http://freehaven.net/~karsten/hidserv/perfdata-2008-06-12.tar.gz}}
 
-%%\emph{TODO Christian: Add meta-data of your measurements here; we should
-%%consider making your raw data available, too.}
-
 The \emph{internal view} measurements started on April 22 at 4:00pm and
 finished on May 13, 10:20pm, resulting in a total of 1,200 data samples.
 
-\emph{TODO Christian: Make raw data available, too.}
+\emph{TODO Christian: Make raw data available, too. Which Tor version was
+used (see first line in log files)?}
 
 \subsection{Service Publication}
 
@@ -573,9 +561,6 @@
 \begin{tabular}{rrrrrrr}
 Min. & 1st Qu. & Median & Mean & 3rd Qu. & Max. & StdDev\\\hline
 22.85 & 69.91 & 89.68 & 118.10 & 129.20 & 698.10 & 93.77
-% x <- subset(publtime, inittime+publtime<699000)
-% summary((x$inittime+x$publtime)/1000)
-% sd((x$inittime+x$publtime)/1000)
 \end{tabular}
 \caption{Overall service publication times}
 \label{fig:publtime}
@@ -595,9 +580,6 @@
 \begin{tabular}{rrrrrrr}
 Min. & 1st Qu. & Median & Mean & 3rd Qu. & Max. & StdDev\\\hline
 12.85 & 50.59 & 58.27 & 63.25 & 69.73 & 142.20 & 18.67
-% x <- subset(publtime, publtime<361000)
-% summary(x$publtime/1000)
-% sd(x$publtime/1000)
 \end{tabular}
 \caption{Service establishment times}
 \label{fig:esttime}
@@ -613,14 +595,11 @@
 there are four service establishment times below 30 seconds (13, 18, 27,
 and 28 seconds).
 
-It turned out that the reason for this is a minor bug in the code which is
-now fixed. See SVN revision r15113 for
-details.\footnote{\url{http://archives.seul.org/or/cvs/Jun-2008/msg00231.html}}
-% Include "and the fix was released in Tor 0.2.0.x-rc" too for each time
-% you point to an svn revision -RD
-% Also, it would probably be good to mention in which version each bug
-% introduced. This will make it look great when you fix the bugs that
-% were in since Tor 0.0.6. -RD
+It turned out that the reason for this is a minor bug in the code which was
+in Tor since version 0.0.9pre6 released on November 15, 2004. This bug is
+now fixed in SVN revision
+r15113\footnote{\url{http://archives.seul.org/or/cvs/Jun-2008/msg00231.html}}
+and released in Tor 0.2.1.1-alpha on June 13, 2008.
 This is not meant as confirmation for the usefulness of the 30-second
 delay, but only to make the implementation consistent with the
 specification.
@@ -636,9 +615,12 @@
 An in-depth analysis of the log files has revealed an even more severe bug.
 While setting up a hidden service, some valid introduction circuits were
 overlooked and abandoned. This leads to random delay in establishing
-introduction points and publishing a descriptor. This bug is now fixed in
-SVN revision
-r15149.\footnote{\url{http://archives.seul.org/or/cvs/Jun-2008/msg00268.html}}
+introduction points and publishing a descriptor. This bug was introduced
+with Tor version 0.2.0.14-alpha released on December 23, 2007. It is now
+fixed in SVN revision
+r15149\footnote{\url{http://archives.seul.org/or/cvs/Jun-2008/msg00268.html}}
+and included in Tor 0.2.1.1-alpha and 0.2.0.28-rc which were both released
+on June 13, 2008.
 
 This bugfix made it necessary to perform a second set of measurements with
 special focus on service publication. Figure~\ref{fig:esttime2} shows
@@ -655,9 +637,6 @@
 \begin{tabular}{rrrrrrr}
 Min. & 1st Qu. & Median & Mean & 3rd Qu. & Max. & StdDev\\\hline
 35.55 & 46.73 & 51.99 & 56.73 & 61.42 & 141.60 & 15.82
-% x <- subset(publtime, publtime<3600000)
-% summary(x$publtime/1000)
-% sd(x$publtime/1000)
 \end{tabular}
 \caption{Service establishment times with bugfixed Tor version}
 \label{fig:esttime2}
@@ -719,12 +698,16 @@
 the first descriptor. It appears rather unlikely, that 10 out of 13 circuit
 establishment fail. In fact, it turned out that there is another major bug
 in Tor that completely ignores introduction points when they were
-established by cannibalizing an existing circuit. Even though there may in
-general be other reasons for failing establishment of a circuit, this bug
-was the reason for all 10 failed establishments in the considered case.
-Fixing this bug will probably significantly speed up and reduce variance in
-service publication time.
+established by cannibalizing an existing circuit. This bug was introduced
+in Tor with version 0.2.0.14-alpha released on December 23, 2007. It will
+probably be fixed in the upcoming Tor versions 0.2.0.29(-rc) and
+0.2.1.2-alpha.
 
+Even though there may in general be other reasons for failing establishment
+of a circuit, this bug was the reason for all 10 failed establishments in
+the considered case. Fixing this bug will probably significantly speed up
+and reduce variance in service publication time.
+
 \paragraph{Usefulness of 30-Seconds Delay}
 
 The 30-seconds delay before publishing a descriptor unsurprisingly leads to
@@ -748,6 +731,9 @@
 After reviewing and analyzing all data of the internal measurements, only a 
 selection is presented here.
 
+\emph{TODO Christian: Which data did you use for your evaluations? You
+should mention that here.}
+
 \paragraph{Overall Establishment Time}
 
 Figure~\ref{fig:opentime} shows the distribution of user experienced connection
@@ -775,6 +761,21 @@
 was chosen for extension, that was not open yet. That means, that also the first
 to third hop can be responsible for the failure.
 
+\emph{TODO Christian: Can you confirm that these measurements were
+performed with a Tor version that was \emph{not} affected by the bug
+that was discussed in the previous subsection and that was introduced in
+Tor 0.2.0.14-alpha? If so, mention this fact here.}
+
+\emph{TODO Christian: See Roger's comment that a timeout of one minute
+seems to be too long. Should the timeout be reduced here? And does this
+timeout apply only to the step of connecting to the introduction point or
+to the whole establishment process? In the latter case, would it make sense
+to have more than one timeout for the different phases of connection
+establishment, e.g.\ 20 seconds for connecting to the introduction point
+and 40 seconds for the rest of connection establishment? It might be
+another improvement to further investigate this and recommend new
+timeouts.}
+
 \begin{figure}
 \centering
 \includegraphics[width=0.8\textwidth]{introcirc.png}
@@ -784,8 +785,7 @@
 
 \paragraph{Cell Transfer Over the Same Circuit}
 
-% XXX do you mean \ref{fig:estrend_rendack} here? -RD
-Figure~\ref{fig:introcirc} compares the transfer times of the 
+Figure~\ref{fig:estrend_rendack} compares the transfer times of the 
 ESTABLISH\_RENDEZVOUS and RENDEZVOUS\_ACK cells sent over the same rendezvous
 circuit. Although for very low values below 0.1 seconds the latter is
 faster than the former, there seems to be a linear relationship between the 
@@ -793,8 +793,9 @@
 
 Also interesting is the formation around the acknowledgments with one second, 
 which also occur for ESTABLISH\_RENDEZVOUS cell, that are twice as fast.
-% What might that mean? That's weird. -RD
 
+\emph{TODO Christian: What might that mean? That's weird. -RD}
+
 \begin{figure}
 \centering
 \includegraphics[width=0.8\textwidth]{estrend_rendack.png}
@@ -802,6 +803,10 @@
 \label{fig:estrend_rendack}
 \end{figure}
 
+\emph{TODO Christian: Don't we want to present more results here? Also
+mention possible improvements due to the findings. The discussion section
+was meant to be more like a conclusion, i.e.\ just two or three paragraphs
+long. We might even want to rename it appropriately.}
 
 \subsection{Message Transfer}
 
@@ -831,11 +836,12 @@
 Further, in some cases mean request times are up to two orders of magnitude
 higher than mean response times and vice versa. This is rather surprising,
 because both messages are routed via the same Tor circuit.
-% Perhaps it's because there's a high-volume flow moving in one direction
-% over one of the nodes in the circuit but it's only light in the other?
-% I would imagine that's pretty common actually. Mike Perry can speculate
-% more here. -RD
 
+\emph{TODO Karsten: Perhaps it's because there's a high-volume flow moving in one direction
+over one of the nodes in the circuit but it's only light in the other?
+I would imagine that's pretty common actually. Mike Perry can speculate
+more here. -RD}
+
 At the moment, message transfer times raise more questions than can be
 answered. However, these questions are not directly related to hidden
 services, but also apply to regular Tor circuits. A solution to improve
@@ -870,11 +876,15 @@
 
 Hidden service connections appear to be quite stable, so that there is no
 need to put special focus on it in the attempt to improve the hidden
-service protocol.
+service protocol. Furthermore, in the measurement setup the requested ports
+were chosen randomly and were very likely \emph{not} included in the set of
+long-lived ports (which are by default: 21, 22, 706, 1863, 5050, 5190,
+5222, 5223, 6667, 6697, 8300). For connections to these ports Tor selects
+only high-uptime nodes to reduce the chance that a node will go down before
+the connection is finished. As a consequence, e.g.\ instant messaging
+connections probably exhibit even higher connection stability than measured
+here.
 
-% Also worth mentioning that most of the requested ports were not in
-% LongLivedPorts, so we can expect even higher stability for IM. -RD
-
 \section{Discussion}
 
 Ideas what changes are most likely to improve the overall performance.
@@ -899,8 +909,6 @@
 \item $\cdots$
 \end{itemize}
 
-
-
 \emph{TODO Karsten: When this list is reasonably populated, make two
 paragraphs out of it.}
 \end{document}