[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] r1767: Minor cleanup. (doc/trunk/dejector)



Author: rabbi
Date: 2007-03-25 09:44:40 -0400 (Sun, 25 Mar 2007)
New Revision: 1767

Modified:
   doc/trunk/dejector/dejector.tex
Log:
Minor cleanup.


Modified: doc/trunk/dejector/dejector.tex
===================================================================
--- doc/trunk/dejector/dejector.tex	2007-03-23 09:33:52 UTC (rev 1766)
+++ doc/trunk/dejector/dejector.tex	2007-03-25 13:44:40 UTC (rev 1767)
@@ -31,7 +31,7 @@
 
 \title{Guns and Butter: Toward Formal Axioms of Input Validation}
 
-\author{Robert J. Hansen\inst{1} and Meredith L. Patterson\inst{1} and Len Sassaman\inst{2}}
+\author{Meredith L. Patterson\inst{1} and Robert J. Hansen\inst{1} and Len Sassaman\inst{2}}
 
 \institute{The University of Iowa Department of Computer Science\\
 Iowa City, Iowa, 52242 USA
@@ -50,7 +50,7 @@
 
 \begin{abstract}
 
-Input validation has long been recognized as an essential part of a well-designed system, especially when considering defense against an external adversary, yet existing literature gives little in the way of formal axioms for input validation or guidance on putting into practice what few recommendations exist.~\cite{FIXME}. 
+Input validation has long been recognized as an essential part of a well-designed system, especially when considering defense against an external adversary, yet existing literature gives little in the way of formal axioms for input validation or guidance on putting into practice what few recommendations exist~\cite{FIXME}. 
 
 In the domain of network security, the problem of input validation is often avoided by maintaining strict control over the network's perimeter or sensitive systems, and restricting input from untrusted users. However, in cases where a service exists to provide content to untrusted users in response to their requests, this type of protection is infeasible. Such is the case with many dynamically-generated websites or databanks that allow untrusted users to submit SQL queries to a database. We examine existing solutions to a common network threat, the \emph{SQL injection attack}, and agree with existing literature~\cite{FIXME} that input validation is critical in addressing this attack.
 
@@ -181,12 +181,13 @@
 
 \section{Axiomatic Input Validation}
 The conventional wisdom for validating user inputs is to use regular expressions to filter out bad inputs or allow in only good ones. 
-A schism exists over which is the safest way to perform regular expression validation: whether it is better  to identify and allow through only those inputs known to be good, or to to identify and forbid inputs known to be bad.
+A schism exists over which is the safest way to perform regular expression validation: whether it is better  to identify and allow through only those inputs known to be good, or to to identify and forbid inputs known to be bad~\cite{XXXXFIXME}.
 However, this misses the more important point: for a majority of input situations, regular expressions are computationally insufficient for input validation.
 As Theorem \ref{th:minval} shows, attempts to validate $\Sigma_I^S$ using a mechanism weaker than $M$ will often fail to recognize valid strings or identify invalid ones.
 
 \subsection{Guns or Butter}
-For instance, it is not possible to create a regular expression which will reliably match $a^mb^m$, a language which is well-known to be context-free.
+%rewrite this to be clearer, and avoid unsubstantiated statements.
+It is not possible to create a regular expression which will reliably match $a^mb^m$, a language which is well-known to be context-free.
 Any finite state machine designed to recognize this language will either accept some string $a^mb^n$, $m \neq n$ (the \emph{pigeonhole principle}), or reject a sufficiently long $a^mb^m$.
 Attempting to coerce regular expressions into handling this language will inevitably lead to friction between what the regular expression can do, what developers think it can do, and what users need it to do.
 In practice, developers end up erring on either the side of safety or the side of convenience.
@@ -195,7 +196,7 @@
 However, we assert that whether an error is made on the side of safety or the side of convenience, it remains an error.
 
 \subsection{Guns and Butter}
-Shifting to a more appropriate tool---in our $a^mb^m$ example, a pushdown automata---enables us to validate inputs as good or bad with perfect accuracy. 
+Shifting to a more appropriate tool---in our $a^mb^m$ example, a context free grammar---enables us to validate inputs as good or bad with perfect accuracy. 
 No longer do we have to make the tradeoff between security and convenience; we can have both guns and butter. 
 Although the input language may provide a means for malicious behaviour, it is possible to create subsets of the input language in which it is only possible to generate secure strings.
 Lemma \ref{lem:sublanguage-construction} provides a mechanism for generating subsets of context-free languages and Observation \ref{obs:equiv} allows us to extend the lemma to regular languages and finite-state automata.
@@ -223,7 +224,7 @@
 In the second approach, a regular expression such as {\tt [a-zA-Z0-9]+} is assumed to cover all possible valid inputs.
 However, both approaches can reject valid inputs, and international encodings present problems which are far too unwieldy for regular expressions to handle in a reasonable fashion.
 
-(FIXME: expand on this here and reference the recent injection issue from http://www.postgresql.org/docs/techdocs.50)
+(FIXME: expand on this here and reference the recent injection issue from {\tt http://www.postgresql.org/docs/techdocs.50})
 
 Having shown that regular expressions are inadequate to the task, we return to Theorems \ref{th:minval} and \ref{th:rangeval}, which say we need to validate over $\Sigma_I^S$ using a mechanism at least as strong as that used to generate valid command strings---in this case, a context-free grammar.
 
@@ -240,7 +241,7 @@
 We define, using Lemma \ref{lem:sublanguage-construction}, a sublanguage of {\sc sql} wherein the only allowed production rules are those which generate the sequence {\tt "select rights from table where username='{\sc terminal-string}' and password='{\sc terminal-string}'"}, where {\sc terminal-string} denotes a sequence of characters which derives no keyword or other syntactic structure. (FIXME: this is a shitty way of phrasing that.)
 
 We take whatever command string $C$ we create from splicing together our command string and the user input, then determine whether $C$ is a valid string in our known-safe sublanguage, using a tool of sufficient computational strength to produce all valid strings in that sublanguage.
-In this case, since we know that a context-free grammar is sufficient to generate {\sc sql} strings and our known-good context-free sublanguage generates a subset of all possible valid {\sc sql} strings, we know that a pushdown automata is sufficient for our purposes.
+In this case, since we know that a context-free grammar is sufficient to generate {\sc sql} strings and our known-good context-free sublanguage generates a subset of all possible valid {\sc sql} strings, we know that a context-free grammar is sufficient for our purposes.
 If $C$ is a valid string in our sublanguage, the input is good.
 If it is not, the input is bad.
 

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/