[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[freehaven-cvs] r1768: misc formatting issues (doc/trunk/dejector)



Author: rabbi
Date: 2007-03-26 02:59:59 -0400 (Mon, 26 Mar 2007)
New Revision: 1768

Modified:
   doc/trunk/dejector/dejector.tex
Log:
misc formatting issues


Modified: doc/trunk/dejector/dejector.tex
===================================================================
--- doc/trunk/dejector/dejector.tex	2007-03-25 13:44:40 UTC (rev 1767)
+++ doc/trunk/dejector/dejector.tex	2007-03-26 06:59:59 UTC (rev 1768)
@@ -52,22 +52,22 @@
 
 Input validation has long been recognized as an essential part of a well-designed system, especially when considering defense against an external adversary, yet existing literature gives little in the way of formal axioms for input validation or guidance on putting into practice what few recommendations exist~\cite{FIXME}. 
 
-In the domain of network security, the problem of input validation is often avoided by maintaining strict control over the network's perimeter or sensitive systems, and restricting input from untrusted users. However, in cases where a service exists to provide content to untrusted users in response to their requests, this type of protection is infeasible. Such is the case with many dynamically-generated websites or databanks that allow untrusted users to submit SQL queries to a database. We examine existing solutions to a common network threat, the \emph{SQL injection attack}, and agree with existing literature~\cite{FIXME} that input validation is critical in addressing this attack.
+In the domain of network security, the problem of input validation is often avoided by maintaining strict control over the network's perimeter or sensitive systems, and restricting input from untrusted users. However, in cases where a service exists to provide content to untrusted users in response to their requests, this type of protection is infeasible. Such is the case with many dynamically-generated websites or databanks that allow untrusted users to submit \textsc{sql} queries to a database. We examine existing solutions to a common network threat, the \emph{\textsc{sql} injection attack}, and agree with existing literature~\cite{FIXME} that input validation is critical in addressing this attack.
 
-We then present basic formal axioms for input validation, and apply them to SQL, where we demonstrate enhanced resistance to injection attacks against databases using our input validation library.
+We then present basic formal axioms for input validation, and apply them to \textsc{sql}, where we demonstrate enhanced resistance to injection attacks against databases using our input validation library.
 
 \end{abstract}
 
 
 \section{Introduction}
  
-SQL injection attacks have been a leading method of network compromise since their discovery in XXXX~\cite{}. Blah blah blah. Numerous methods have been proposed and deployed to hinder or inhibit injection attacks on SQL, to varying success. In this paper, we review the techniques both proposed in the literature, and deployed in the field. There is agreement in the literature that strong input validation is the proper approach to solving injection attacks in general.~\cite{FIXME}. However, the literature is lacking both in a set of formal axioms for input validation, as well as guidance for the implementors of systems concerned with input validation. We present such a framework in Section \ref{framework}, say something else in Section \ref{}, and show our approach as applied to SQL in Section \ref{}. In Section \ref{dejector}, we discuss {\tt Libdejector}, our approach as implemented as a PostgreSQL~\cite{} library.
+\textsc{sql} injection attacks have been a leading method of network compromise since their discovery in XXXX~\cite{}. Blah blah blah. Numerous methods have been proposed and deployed to hinder or inhibit injection attacks on \textsc{sql}, to varying success. In this paper, we review the techniques both proposed in the literature, and deployed in the field. There is agreement in the literature that strong input validation is the proper approach to solving injection attacks in general.~\cite{FIXME}. However, the literature is lacking both in a set of formal axioms for input validation, as well as guidance for the implementors of systems concerned with input validation. We present such a framework in Section \ref{framework}, say something else in Section \ref{}, and show our approach as applied to \textsc{sql} in Section \ref{}. In Section \ref{dejector}, we discuss {\tt Libdejector}, our approach as implemented as a PostgreSQL~\cite{} library.
  
 \section{Injection attacks}
 
 Talk about injection attacks, esp
 
-\subsection{SQL Injection}
+\subsection{\textsc{sql} Injection}
 \subsubsection{regex, etc...}
 
 \section{The need for formal models} 
@@ -133,7 +133,7 @@
 \end{observation}
 
 Hoglund and McGraw\cite{hoglund-mcgraw} note that ``a complex computational system is an engine for executing malicious computer programs delivered in the form of crafted input.''
-Their concern---the inevitable compromise, given enough time, of systems---informs this observation.
+Their concern --- the inevitable compromise, given enough time, of systems --- informs this observation.
 If we assume that systems will be compromised at some point, prudence requires that we expose to our attackers the weakest mechanism possible.
 An attacker given access to a strong mechanism has far more potential for mayhem than one using a weaker mechanism.
 
@@ -162,13 +162,13 @@
 
 \subsection{Injection Formally Defined}
 
-In practice, attackers use injection to retrieve information which they cannot access legitimately or to manipulate the state of some set of information (e.g., unauthorized {\sc insert} or {\sc delete} queries on a relational database).
+In practice, attackers use injection to retrieve information which they cannot access legitimately or to manipulate the state of some set of information (e.g., unauthorized \textsc{insert} or \textsc{delete} queries on a relational database).
 Either the contents of the result set or the contents of the information set will differ from those which a non-injected query would produce.
 This necessarily alters both the semantic meaning and the syntactic structure of the query string.
 
 However, altering syntactic structure need not change semantic meaning.
-Adding any number of repetitions of the clause {\tt and '1' =  '1'} to the {\sc where} clause of an SQL {\sc select} statement will not produce a change in the contents of the result set, because $\forall P$, $P \wedge True \rightarrow P$ by identity.
-Each repetition changes the parse, though, and a sufficiently large number of repetitions could produce observable effects in the state of the machine responding to the query (e.g., increased {\sc cpu} or memory consumption during query processing, since systems which parse user input typically transform the entire parse tree into an in-memory representation and do not resolve tautologies or identities\footnote{Indeed, identifying all tautologies and identities is impossible in practice.
+Adding any number of repetitions of the clause {\tt and '1' =  '1'} to the {\sc where} clause of an \textsc{sql} {\textsc{select} statement will not produce a change in the contents of the result set, because $\forall P$, $P \wedge True \rightarrow P$ by identity.
+Each repetition changes the parse, though, and a sufficiently large number of repetitions could produce observable effects in the state of the machine responding to the query (e.g., increased \textsc{cpu} or memory consumption during query processing, since systems which parse user input typically transform the entire parse tree into an in-memory representation and do not resolve tautologies or identities\footnote{Indeed, identifying all tautologies and identities is impossible in practice.
 While string-literal comparisons are easily identified, given minimal knowledge of a database schema an attacker can still compose comparisons of variable data which always evaluate to $True$, e.g. {\tt user.id = user.id} or {\tt count(id) > -1}.}).
 As unnecessarily increased resource consumption is also harmful, we argue that query alteration which alters syntactic structure but not semantic meaning constitutes injection as well.
 
@@ -196,56 +196,58 @@
 However, we assert that whether an error is made on the side of safety or the side of convenience, it remains an error.
 
 \subsection{Guns and Butter}
-Shifting to a more appropriate tool---in our $a^mb^m$ example, a context free grammar---enables us to validate inputs as good or bad with perfect accuracy. 
+% Where do we define a context-free grammar?
+Shifting to a more appropriate tool --- in our $a^mb^m$ example, a context free grammar --- enables us to validate inputs as good or bad with perfect accuracy. 
 No longer do we have to make the tradeoff between security and convenience; we can have both guns and butter. 
 Although the input language may provide a means for malicious behaviour, it is possible to create subsets of the input language in which it is only possible to generate secure strings.
 Lemma \ref{lem:sublanguage-construction} provides a mechanism for generating subsets of context-free languages and Observation \ref{obs:equiv} allows us to extend the lemma to regular languages and finite-state automata.
 
-\section{Applying Axioms to SQL}
+\section{Applying Axioms to {\sc Sql}}
 \label{sql}
-The Structured Query Language, {\sc sql}, is an {\sc ansi/iso} standard language for interacting with relational databases. 
-User input is in the form of a string in some encoding, typically {\sc ascii} or Unicode; the string is parsed by a context-free grammar, usually one generated by a parser generator such as yacc or bison.
+The Structured Query Language, \textsc{sql}, is an \textsc{ansi/iso} standard language for interacting with relational databases. 
+User input is in the form of a string in some encoding, typically \textsc{ascii} or Unicode; the string is parsed by a context-free grammar, usually one generated by a parser generator such as {\tt yacc} or {\tt bison}.
 
-{\sc sql} offers tremendous flexibility, but this flexibility can be used against implementors as easily as implementors can use it against its problem domain. 
+\textsc{Sql} offers tremendous flexibility, but this flexibility can be used against implementors as easily as implementors can use it against its problem domain. 
 In this section, we show how formal axioms of input validation give improved resistance to a certain class of attacks.
 
-\subsection{SQL Injection}
-{\sc sql} \emph{injection} is a class of attacks against software which interfaces with a database. 
+\subsection{\textsc{Sql} Injection}
+\textsc{Sql} \emph{injection} is a class of attacks against software which interfaces with a database. 
 An innocuous-looking statement, such as {\tt "select rights from table where username =} $I_1$ {\tt and password =} $I_2${\tt "}, can be subverted with carefully-chosen $I_n$ values.
 
 For instance, if $I_1$ is {\tt "root"} and $I_2$ is the string {\tt "' or '1' = '1'"}, the final statement passed to the database is {\tt "select rights from table where username = 'root' and password = '' or '1' = '1'"}. 
 This malformed boolean expression evaluates to true, so root's rights will be granted.
+%Are the quotes right above? FIXME
 
 There are two possible regular-expression-based solutions to this problem: disallow certain characters from user input, or accept only certain patterns.
-(A third approach, \emph{prepared statements}, is similar to the second solution, but only applies to SQL and cannot be extended to other injection vectors such as XML/XPath injection.
+(A third approach, \emph{prepared statements}, is similar to the second solution, but only applies to \textsc{sql} and cannot be extended to other injection vectors such as XML/XPath injection.
 We will discuss this further shortly.)
 Most developers take the first approach, examining input strings for control characters such as {\tt "}, which terminates a string argument prematurely and allows an attacker to inject a malicious boolean substring; {\tt ;}, which terminates a query and allows an attacker to inject additional queries in the same input string; or {\tt --}, which is used to comment out the remainder of a line and bypass any remaining portion of the intended input which would constitute an unparseable fragment.
 The validator either rejects such strings or ``escapes'' those characters so that the database parser does not interpret them as control characters.
 In the second approach, a regular expression such as {\tt [a-zA-Z0-9]+} is assumed to cover all possible valid inputs.
 However, both approaches can reject valid inputs, and international encodings present problems which are far too unwieldy for regular expressions to handle in a reasonable fashion.
 
-(FIXME: expand on this here and reference the recent injection issue from {\tt http://www.postgresql.org/docs/techdocs.50})
+%(FIXME: expand on this here and reference the recent injection issue from {\tt http://www.postgresql.org/docs/techdocs.50})
 
-Having shown that regular expressions are inadequate to the task, we return to Theorems \ref{th:minval} and \ref{th:rangeval}, which say we need to validate over $\Sigma_I^S$ using a mechanism at least as strong as that used to generate valid command strings---in this case, a context-free grammar.
+Having shown that regular expressions are inadequate to the task, we return to Theorems \ref{th:minval} and \ref{th:rangeval}, which say we need to validate over $\Sigma_I^S$ using a mechanism at least as strong as that used to generate valid command strings --- in this case, a context-free grammar.
 
 We do not know what the user will input, and as such, predicting $\Sigma_I^S$ is problematic. 
 However, by Definition \ref{def:syntactic-context}, the upper bound of $\Sigma^S$ is the whole of $S$, so we elect to validate over the entire string.
-(In practice, this is entirely computationally feasible, since $S$ must be parsed by the {\sc sql} parser eventually anyway.)
+(In practice, this is entirely computationally feasible, since $S$ must be parsed by the \textsc{sql} parser eventually anyway.)
 
 So far, we have not validated anything. 
-If we were to put the injected {\sc sql} string through an {\sc sql} parser, it would parse correctly.
+If we were to put the injected \textsc{sql} string through an \textsc{sql} parser, it would parse correctly.
 The string is syntactically valid, but it is semantically invalid, in that it does not achieve the intended results.
-Our challenge is to define a sublanguage of {\sc sql} which only generates strings that have our intended semantic meaning---``safe'' strings, in other words.
+Our challenge is to define a sublanguage of \textsc{sql} which only generates strings that have our intended semantic meaning --- ``safe'' strings, in other words.
 
-Fortunately, defining a safe subset of {\sc sql} for this case is simple. 
-We define, using Lemma \ref{lem:sublanguage-construction}, a sublanguage of {\sc sql} wherein the only allowed production rules are those which generate the sequence {\tt "select rights from table where username='{\sc terminal-string}' and password='{\sc terminal-string}'"}, where {\sc terminal-string} denotes a sequence of characters which derives no keyword or other syntactic structure. (FIXME: this is a shitty way of phrasing that.)
+Fortunately, defining a safe subset of \textsc{sql} for this case is simple. 
+We define, using Lemma \ref{lem:sublanguage-construction}, a sublanguage of \textsc{sql} wherein the only allowed production rules are those which generate the sequence {\tt "select rights from table where username='{\sc terminal-string}' and password='{\sc terminal-string}'"}, where {\sc terminal-string} denotes a sequence of characters which derives no keyword or other syntactic structure. (FIXME: this is a shitty way of phrasing that.)
 
 We take whatever command string $C$ we create from splicing together our command string and the user input, then determine whether $C$ is a valid string in our known-safe sublanguage, using a tool of sufficient computational strength to produce all valid strings in that sublanguage.
-In this case, since we know that a context-free grammar is sufficient to generate {\sc sql} strings and our known-good context-free sublanguage generates a subset of all possible valid {\sc sql} strings, we know that a context-free grammar is sufficient for our purposes.
+In this case, since we know that a context-free grammar is sufficient to generate \textsc{sql} strings and our known-good context-free sublanguage generates a subset of all possible valid \textsc{sql} strings, we know that a context-free grammar is sufficient for our purposes.
 If $C$ is a valid string in our sublanguage, the input is good.
 If it is not, the input is bad.
 
-By so doing, we are secure against {\sc sql} injection attacks.
+By so doing, we are secure against \textsc{sql} injection attacks.
 
 \section{Dejector}
 \label{dejector}

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/