[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[freehaven-cvs] r1765: Initial restructuring. (doc/trunk/dejector)

To: freehaven-cvs@xxxxxxxxxxxxx
Subject: [freehaven-cvs] r1765: Initial restructuring. (doc/trunk/dejector)
From: rabbi@xxxxxxxx
Date: Thu, 22 Mar 2007 08:28:15 -0400 (EDT)
Delivered-to: archiver@seul.org
Delivered-to: freehaven-cvs-outgoing@seul.org
Delivered-to: freehaven-cvs@seul.org
Delivery-date: Thu, 22 Mar 2007 08:28:42 -0400
Reply-to: freehaven-dev@xxxxxxxxxxxxx
Sender: owner-freehaven-cvs@xxxxxxxxxxxxx
Author: rabbi
Date: 2007-03-22 08:28:12 -0400 (Thu, 22 Mar 2007)
New Revision: 1765

Modified:
   doc/trunk/dejector/dejector.tex
   doc/trunk/dejector/llncs-rabbi.cls
Log:
Initial restructuring.


Modified: doc/trunk/dejector/dejector.tex
===================================================================
--- doc/trunk/dejector/dejector.tex	2007-03-22 10:51:20 UTC (rev 1764)
+++ doc/trunk/dejector/dejector.tex	2007-03-22 12:28:12 UTC (rev 1765)
@@ -1,171 +1,237 @@
-\documentclass{article}
-
-\usepackage{amsthm}
-
-
-\theoremstyle{plain}		\newtheorem{definition}{Definition}
-\theoremstyle{plain}		\newtheorem{observation}{Observation}
-\theoremstyle{plain}		\newtheorem{theorem}{Theorem}
-\theoremstyle{plain}		\newtheorem{lemma}{Lemma}
-
-\author{Robert J. Hansen, Meredith L. Patterson and Len Sassaman}
-\title{Guns and Butter: Toward Formal Axioms of Input Validation}
- 
-\begin{document}
-\maketitle
- 
-\section{Introduction}
- 
-(Rob had three sections at the beginning which comprise the introduction and motivation; I think all of these are going to need to be rewritten.)
- 
-\section{Framework}
-\subsection{Definitions and Observations}
-We begin with several definitions from elementary computability theory, and draw observations which motivate our work.
-
-\begin{definition}[Validity]
-A string S in a language L is valid if and only if S can be generated from the formal description of L.
-\end{definition}
-
-\begin{definition}[P-language]
-A language L is a P-language if and only if all strings generated by L exhibit the property P.
-\end{definition}
-
-\begin{observation}[Validity of sublanguages]
-All strings in the P-language L' derived from L are valid P-strings in L, regardless of whether L is a P-language.
-\end{observation}
-
-This observation, while quite general, has practical consequences. 
-By constructing a properly constrained sublanguage of an insecure language, we may generate secure strings within that insecure language.
-
-\begin{definition}[M-set]
-The M-set of languages is composed of all languages which require a mechanism of minimum strength M for their generation.
-\end{definition}
-
-\begin{definition}[Syntactic context]
-\label{def:syntactic-context}
-The syntactic context for an input I in a string S is the maximum extent in S to which I can change the syntactic structure (or \emph{parse}) of S, up to an upper bound of the entirety of S.
-\end{definition}
-
-We denote the syntactic context of $S$ for a given input $I$ with the notation $\Sigma_I^S$. 
-When a string refers to a variable, we note that for any variable $V$ defined in a syntactic context $\Sigma^T$, any syntactic context $\Sigma^S$ which references $V$ will incorporate $\Sigma^T$.
-
-\begin{definition}[Semantic context]
-A semantic context for an input I in a string S is the maximum extent to which I can change the semantic meaning of S, up to an upper bound of the entirety of S.
-\end{definition}
-
-We denote the semantic context of $S$ for a given input $I$ with the notation $\Upsilon_I^S$. 
-Our discussion of $\Upsilon_I^S$ will be limited, as semantic analysis is beyond the scope of this paper except insofar as it can be facilitated via syntactic analysis.
-
-\subsection{Input Validation Theorems}
-\begin{theorem}[Minimum validation strength]
-\label{th:minval}
-Given a language L which requires a minimally--strong computational mechanism M to generate strings $\in$ L, an arbitrary string S $\in$ L must be validated using a mechanism at least as strong as M.
-\end{theorem}
-\emph{Proof.} Since validating a string against a language is done by applying the production rules of the language to create the string being validated, if it is possible to validate $S$ using a mechanism weaker than $M$, then $M$ is not the minimally--strong mechanism required to generate $L$'s productions. 
-This contradicts our statement that $M$ is minimally strong.
-
-Note that it is trivial to generate a one-off instance of a computational mechanism weaker than $M$ which validates one or more strings in $L$, but this does not invalidate this theorem.
-Consider $L = \{all strings of the form a^nb^n\}$.
-This language is context-free, but a finite state machine $F$ can be constructed which validates strings in $L$ of some finite length $l$.
-However, $L$ still contains an infinite number of strings which $F$ cannot validate; validating an arbitrary string in $L$ requires a pushdown automata.
-
-\begin{observation}[Maximal validation strength]
-A language L requiring a minimally--strong computational mechamism M should not be validated using a mechanism stronger than M.
-\end{observation}
-
-Hoglund and McGraw\cite{hoglund-mcgraw} note that ``a complex computational system is an engine for executing malicious computer programs delivered in the form of crafted input.''
-Their concern---the inevitable compromise, given enough time, of systems---informs this observation.
-If we assume that systems will be compromised at some point, prudence requires that we expose to our attackers the weakest mechanism possible.
-An attacker given access to a strong mechanism has far more potential for mayhem than one using a weaker mechanism.
-
-\begin{theorem}[Range of validation]
-\label{th:rangeval}
-Given an input I and a string S, validation must occur over at least the range $\Sigma_I^S$.
-\end{theorem}
-\emph{Proof.} Since $\Sigma_I^S$ is defined as the greatest extent to which $I$ has the capability to influence the parse of $S$, validating a range smaller than $\Sigma_I^S$ may result in failure to validate all the syntactic changes $I$ may introduce to $S$.
-Validation of only $I$ is sufficient in only those cases where it can be proven $\Sigma_I^S = I$.
-
-\begin{lemma}[Construction of sublanguages]
-\label{lem:sublanguage-construction}
-Given pushdown automata G and H, where $H = \langle V, T, R, S \in V \rangle$, L(G) is a sublanguage of L(H) if $V(G) \subseteq V(H), T(G) \subseteq T(H), R(G) \subseteq R(H)$ and $S(G) = S(H)$.
-\end{lemma}
-\emph{Proof by Construction.} For any string $w$ in $L(G)$, there exists a production $P(w) \in G$.
-All variables, terminals, and rules used in $P(w)$ are correspondingly in $G$. $S(P(w)) = S(G)$.
-$V(P(w))$ must necessarily exist in $V(G)$, which as a given exists in $V(H)$. 
-Similar arguments exist for $T, R, S$. 
-Thus, $P(w) \in H$ and $w \in L(H)$.
-Consequently, since any arbitrary string in $L(G)$ is also a string in $L(H)$, $L(G)$ is a sublanguage of $L(H)$.
-
-\begin{observation}[Mechanism equivalence]
-\label{obs:equiv}
-Any regular expression, finite-state automata or context-free grammar can be converted into an equivalent pushdown automata, and thus falls under Lemma 1.
-\end{observation}
-
-\section{Axiomatic Input Validation}
-The conventional wisdom for validating user inputs is to use regular expressions to filter out bad inputs or allow in only good ones. 
-A schism exists over which is the safest way to perform regular expression validation: whether it is better  to identify and allow through only those inputs known to be good, or to to identify and forbid inputs known to be bad.
-However, this misses the more important point: for a majority of input situations, regular expressions are computationally insufficient for input validation.
-As Theorem \ref{th:minval} shows, attempts to validate $\Sigma_I^S$ using a mechanism weaker than $M$ will often fail to recognize valid strings or identify invalid ones.
-
-\subsection{Guns or Butter}
-For instance, it is not possible to create a regular expression which will reliably match $a^mb^m$, a language which is well-known to be context-free.
-Any finite state machine designed to recognize this language will either accept some string $a^mb^n$, $m \neq n$ (the \emph{pigeonhole principle}), or reject a sufficiently long $a^mb^m$.
-Attempting to coerce regular expressions into handling this language will inevitably lead to friction between what the regular expression can do, what developers think it can do, and what users need it to do.
-In practice, developers end up erring on either the side of safety or the side of convenience.
-Those who favour the former argue that we must admit only those inputs which a regular expression can verify as valid, and accept the false-positive rate (and its attendant inconvenience) as a necessary expense of security.
-Those who prefer the latter say we must exclude only those inputs known to be bad, and accept the false-negative rate (and the security holes opened up) as a necessary expense of convenience.
-However, we assert that whether an error is made on the side of safety or the side of convenience, it remains an error.
-
-\subsection{Guns and Butter}
-Shifting to a more appropriate tool---in our $a^mb^m$ example, a pushdown automata---enables us to validate inputs as good or bad with perfect accuracy. 
-No longer do we have to make the tradeoff between security and convenience; we can have both guns and butter. 
-Although the input language may provide a means for malicious behaviour, it is possible to create subsets of the input language in which it is only possible to generate secure strings.
-Lemma \ref{lem:sublanguage-construction} provides a mechanism for generating subsets of context-free languages and Observation \ref{obs:equiv} allows us to extend the lemma to regular languages and finite-state automata.
-
-\section{Applying Axioms to SQL}
-The Structured Query Language, {\sc sql}, is an {\sc ansi/iso} standard language for interacting with relational databases. 
-User input is in the form of a string in some encoding, typically {\sc ascii} or Unicode; the string is parsed by a context-free grammar, usually one generated by a parser generator such as yacc or bison.
-
-{\sc sql} offers tremendous flexibility, but this flexibility can be used against implementors as easily as implementors can use it against its problem domain. 
-In this section, we show how formal axioms of input validation give improved resistance to a certain class of attacks.
-
-\subsection{SQL Injection}
-{\sc sql} \emph{injection} is a class of attacks against software which interfaces with a database. 
-An innocuous-looking statement, such as {\tt "select rights from table where username =} $I_1$ {\tt and password =} $I_2${\tt "}, can be subverted with carefully-chosen $I_n$ values.
-
-For instance, if $I_1$ is {\tt "root"} and $I_2$ is the string {\tt "' or '1' = '1'"}, the final statement passed to the database is {\tt "select rights from table where username = 'root' and password = '' or '1' = '1'"}. 
-This malformed boolean expression evaluates to true, so root's rights will be granted.
-
-There are two possible regular-expression-based solutions to this problem: disallow certain characters from user input, or accept only certain patterns.
-(A third approach, \emph{prepared statements}, is similar to the second solution, but only applies to SQL and cannot be extended to other injection vectors such as XML/XPath injection.
-We will discuss this further shortly.)
-Most developers take the first approach, examining input strings for control characters such as {\tt "}, which terminates a string argument prematurely and allows an attacker to inject a malicious boolean substring; {\tt ;}, which terminates a query and allows an attacker to inject additional queries in the same input string; or {\tt --}, which is used to comment out the remainder of a line and bypass any remaining portion of the intended input which would constitute an unparseable fragment.
-The validator either rejects such strings or ``escapes'' those characters so that the database parser does not interpret them as control characters.
-In the second approach, a regular expression such as {\tt [a-zA-Z0-9]+} is assumed to cover all possible valid inputs.
-However, both approaches can reject valid inputs, and international encodings present problems which are far too unwieldy for regular expressions to handle in a reasonable fashion.
-
-(FIXME: expand on this here and reference the recent injection issue from http://www.postgresql.org/docs/techdocs.50)
-
-Having shown that regular expressions are inadequate to the task, we return to Theorems \ref{th:minval} and \ref{th:rangeval}, which say we need to validate over $\Sigma_I^S$ using a mechanism at least as strong as that used to generate valid command strings---in this case, a context-free grammar.
-
-We do not know what the user will input, and as such, predicting $\Sigma_I^S$ is problematic. 
-However, by Definition \ref{def:syntactic-context}, the upper bound of $\Sigma^S$ is the whole of $S$, so we elect to validate over the entire string.
-(In practice, this is entirely computationally feasible, since $S$ must be parsed by the {\sc sql} parser eventually anyway.)
-
-So far, we have not validated anything. 
-If we were to put the injected {\sc sql} string through an {\sc sql} parser, it would parse correctly.
-The string is syntactically valid, but it is semantically invalid, in that it does not achieve the intended results.
-Our challenge is to define a sublanguage of {\sc sql} which only generates strings that have our intended semantic meaning---``safe'' strings, in other words.
-
-Fortunately, defining a safe subset of {\sc sql} for this case is simple. 
-We define, using Lemma \ref{lem:sublanguage-construction}, a sublanguage of {\sc sql} wherein the only allowed production rules are those which generate the sequence {\tt "select rights from table where username='{\sc terminal-string}' and password='{\sc terminal-string}'"}, where {\sc terminal-string} denotes a sequence of characters which derives no keyword or other syntactic structure. (FIXME: this is a shitty way of phrasing that.)
-
-We take whatever command string $C$ we create from splicing together our command string and the user input, then determine whether $C$ is a valid string in our known-safe sublanguage, using a tool of sufficient computational strength to produce all valid strings in that sublanguage.
-In this case, since we know that a context-free grammar is sufficient to generate {\sc sql} strings and our known-good context-free sublanguage generates a subset of all possible valid {\sc sql} strings, we know that a pushdown automata is sufficient for our purposes.
-If $C$ is a valid string in our sublanguage, the input is good.
-If it is not, the input is bad.
-
-By so doing, we are secure against {\sc sql} injection attacks.
-\end{document}
- 
+\documentclass{llncs-rabbi}
+
+
+\usepackage{url}
+%\usepackage{graphics}
+\usepackage{amssymb}
+\usepackage{amsmath}
+%\usepackage{amsthm}
+\usepackage{subfigure}
+\usepackage{epsfig}
+
+% Add \contrib{} from /usr/local/teTeX/share/texmf.tetex/amslatex/tex/latex/amscls/amsproc.cls
+
+\newcommand\V[1]{\overrightarrow{#1}}
+%\theoremstyle{plain}		\newtheorem{definition}{Definition}
+%\theoremstyle{plain}		\newtheorem{observation}{Observation}
+%\theoremstyle{plain}		\newtheorem{theorem}{Theorem}
+%\theoremstyle{plain}		\newtheorem{lemma}{Lemma}
+
+
+% Cut down on whitespace above and below figures displayed at head/foot of
+% page.
+\setlength{\textfloatsep}{3mm}
+% Cut down on whitespace above and below figures displayed in middle of page
+\setlength{\intextsep}{3mm}
+
+
+\setcounter{page}{1}
+
+\begin{document}
+
+\title{Guns and Butter: Toward Formal Axioms of Input Validation}
+
+\author{Robert J. Hansen\inst{1} and Meredith L. Patterson\inst{1} and Len Sassaman\inst{2}}
+
+\institute{The University of Iowa Department of Computer Science\\
+Iowa City, Iowa, 52242 USA
+\email{\{mlpatter,rjhansen\}@cs.uiowa.edu}
+\and
+K. U. Leuven ESAT-COSIC \\
+Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium
+\email{len.sassaman@xxxxxxxxxxxxxxxx}
+}
+
+
+\maketitle
+%\pagestyle{empty}
+%\centerline{\LARGE\bf DRAFT --- not for publication}
+%======================================================================
+
+\begin{abstract}
+
+Input validation has long been recognized as an essential part of a well-designed system, especially when considering defense against an external adversary, yet existing literature gives little in the way of formal axioms for input validation or guidance on putting into practice what few recommendations exist.~\cite{FIXME}. 
+
+In the domain of network security, the problem of input validation is often avoided by maintaining strict control over the network's perimeter or sensitive systems, and restricting input from untrusted users. However, in cases where a service exists to provide content to untrusted users in response to their requests, this type of protection is infeasible. Such is the case with many dynamically-generated websites or databanks that allow untrusted users to submit SQL queries to a database. We examine existing solutions to a common network threat, the \emph{SQL injection attack}, and agree with existing literature~\cite{FIXME} that input validation is critical in addressing this attack.
+
+We then present basic formal axioms for input validation, and apply them to SQL, where we demonstrate enhanced resistance to injection attacks against databases using our input validation library.
+
+\end{abstract}
+
+
+\section{Introduction}
+ 
+SQL injection attacks have been a leading method of network compromise since their discovery in XXXX~\cite{}. Blah blah blah. Numerous methods have been proposed and deployed to hinder or inhibit injection attacks on SQL, to varying success. In this paper, we review the techniques both proposed in the literature, and deployed in the field. There is agreement in the literature that strong input validation is the proper approach to solving injection attacks in general.~\cite{FIXME}. However, the literature is lacking both in a set of formal axioms for input validation, as well as guidance for the implementors of systems concerned with input validation. We present such a framework in Section \ref{framework}, say something else in Section \ref{}, and show our approach as applied to SQL in Section \ref{}. In Section \ref{dejector}, we discuss \tt{Libdejector}, our approach as implemented as a PostgreSQL~\cite{} library.
+ 
+\section{Injection attacks}
+
+Talk about injection attacks, esp
+
+\subsection{SQL Injection}
+\subsubsection{regex, etc...}
+
+\section{The need for formal models} 
+ 
+\section{Framework}
+\label{framework}
+\subsection{Definitions and Observations}
+We begin with several definitions from elementary computability theory, and draw observations which motivate our work.
+
+\begin{definition}[Validity]
+A string S in a language L is valid if and only if S can be generated from the formal description of L.
+\end{definition}
+
+\begin{definition}[P-language]
+A language L is a P-language if and only if all strings generated by L exhibit the property P.
+\end{definition}
+
+\begin{observation}[Validity of sublanguages]
+All strings in the P-language L' derived from L are valid P-strings in L, regardless of whether L is a P-language.
+\end{observation}
+
+This observation, while quite general, has practical consequences. 
+By constructing a properly constrained sublanguage of an insecure language, we may generate secure strings within that insecure language.
+
+\begin{definition}[M-set]
+The M-set of languages is composed of all languages which require a mechanism of minimum strength M for their generation.
+\end{definition}
+
+\begin{definition}[Syntactic context]
+\label{def:syntactic-context}
+The syntactic context for an input I in a string S is the maximum extent in S to which I can change the syntactic structure (or \emph{parse}) of S, up to an upper bound of the entirety of S.
+\end{definition}
+
+We denote the syntactic context of $S$ for a given input $I$ with the notation $\Sigma_I^S$. 
+When a string refers to a variable, we note that for any variable $V$ defined in a syntactic context $\Sigma^T$, any syntactic context $\Sigma^S$ which references $V$ will incorporate $\Sigma^T$.
+
+\begin{definition}[Semantic context]
+A semantic context for an input I in a string S is the maximum extent to which I can change the semantic meaning of S, up to an upper bound of the entirety of S.
+\end{definition}
+
+We denote the semantic context of $S$ for a given input $I$ with the notation $\Upsilon_I^S$. 
+Our discussion of $\Upsilon_I^S$ will be limited, as semantic analysis is beyond the scope of this paper except insofar as it can be facilitated via syntactic analysis.
+
+\subsection{Input Validation Theorems}
+\begin{theorem}[Minimum validation strength]
+\label{th:minval}
+Given a language L which requires a minimally--strong computational mechanism M to generate strings $\in$ L, an arbitrary string S $\in$ L must be validated using a mechanism at least as strong as M.
+\end{theorem}
+\emph{Proof.} Since validating a string against a language is done by applying the production rules of the language to create the string being validated, if it is possible to validate $S$ using a mechanism weaker than $M$, then $M$ is not the minimally--strong mechanism required to generate $L$'s productions. 
+This contradicts our statement that $M$ is minimally strong.
+
+Note that it is trivial to generate a one-off instance of a computational mechanism weaker than $M$ which validates one or more strings in $L$, but this does not invalidate this theorem.
+Consider $L = \{all strings of the form a^nb^n\}$.
+This language is context-free, but a finite state machine $F$ can be constructed which validates strings in $L$ of some finite length $l$.
+However, $L$ still contains an infinite number of strings which $F$ cannot validate; validating an arbitrary string in $L$ requires a pushdown automata.
+
+\begin{observation}[Maximal validation strength]
+A language L requiring a minimally--strong computational mechamism M should not be validated using a mechanism stronger than M.
+\end{observation}
+
+Hoglund and McGraw\cite{hoglund-mcgraw} note that ``a complex computational system is an engine for executing malicious computer programs delivered in the form of crafted input.''
+Their concern---the inevitable compromise, given enough time, of systems---informs this observation.
+If we assume that systems will be compromised at some point, prudence requires that we expose to our attackers the weakest mechanism possible.
+An attacker given access to a strong mechanism has far more potential for mayhem than one using a weaker mechanism.
+
+\begin{theorem}[Range of validation]
+\label{th:rangeval}
+Given an input I and a string S, validation must occur over at least the range $\Sigma_I^S$.
+\end{theorem}
+\emph{Proof.} Since $\Sigma_I^S$ is defined as the greatest extent to which $I$ has the capability to influence the parse of $S$, validating a range smaller than $\Sigma_I^S$ may result in failure to validate all the syntactic changes $I$ may introduce to $S$.
+Validation of only $I$ is sufficient in only those cases where it can be proven $\Sigma_I^S = I$.
+
+\begin{lemma}[Construction of sublanguages]
+\label{lem:sublanguage-construction}
+Given pushdown automata G and H, where $H = \langle V, T, R, S \in V \rangle$, L(G) is a sublanguage of L(H) if $V(G) \subseteq V(H), T(G) \subseteq T(H), R(G) \subseteq R(H)$ and $S(G) = S(H)$.
+\end{lemma}
+\emph{Proof by Construction.} For any string $w$ in $L(G)$, there exists a production $P(w) \in G$.
+All variables, terminals, and rules used in $P(w)$ are correspondingly in $G$. $S(P(w)) = S(G)$.
+$V(P(w))$ must necessarily exist in $V(G)$, which as a given exists in $V(H)$. 
+Similar arguments exist for $T, R, S$. 
+Thus, $P(w) \in H$ and $w \in L(H)$.
+Consequently, since any arbitrary string in $L(G)$ is also a string in $L(H)$, $L(G)$ is a sublanguage of $L(H)$.
+
+\begin{observation}[Mechanism equivalence]
+\label{obs:equiv}
+Any regular expression, finite-state automata or context-free grammar can be converted into an equivalent pushdown automata, and thus falls under Lemma 1.
+\end{observation}
+
+\section{Axiomatic Input Validation}
+The conventional wisdom for validating user inputs is to use regular expressions to filter out bad inputs or allow in only good ones. 
+A schism exists over which is the safest way to perform regular expression validation: whether it is better  to identify and allow through only those inputs known to be good, or to to identify and forbid inputs known to be bad.
+However, this misses the more important point: for a majority of input situations, regular expressions are computationally insufficient for input validation.
+As Theorem \ref{th:minval} shows, attempts to validate $\Sigma_I^S$ using a mechanism weaker than $M$ will often fail to recognize valid strings or identify invalid ones.
+
+\subsection{Guns or Butter}
+For instance, it is not possible to create a regular expression which will reliably match $a^mb^m$, a language which is well-known to be context-free.
+Any finite state machine designed to recognize this language will either accept some string $a^mb^n$, $m \neq n$ (the \emph{pigeonhole principle}), or reject a sufficiently long $a^mb^m$.
+Attempting to coerce regular expressions into handling this language will inevitably lead to friction between what the regular expression can do, what developers think it can do, and what users need it to do.
+In practice, developers end up erring on either the side of safety or the side of convenience.
+Those who favour the former argue that we must admit only those inputs which a regular expression can verify as valid, and accept the false-positive rate (and its attendant inconvenience) as a necessary expense of security.
+Those who prefer the latter say we must exclude only those inputs known to be bad, and accept the false-negative rate (and the security holes opened up) as a necessary expense of convenience.
+However, we assert that whether an error is made on the side of safety or the side of convenience, it remains an error.
+
+\subsection{Guns and Butter}
+Shifting to a more appropriate tool---in our $a^mb^m$ example, a pushdown automata---enables us to validate inputs as good or bad with perfect accuracy. 
+No longer do we have to make the tradeoff between security and convenience; we can have both guns and butter. 
+Although the input language may provide a means for malicious behaviour, it is possible to create subsets of the input language in which it is only possible to generate secure strings.
+Lemma \ref{lem:sublanguage-construction} provides a mechanism for generating subsets of context-free languages and Observation \ref{obs:equiv} allows us to extend the lemma to regular languages and finite-state automata.
+
+\section{Applying Axioms to SQL}
+\label{sql}
+The Structured Query Language, {\sc sql}, is an {\sc ansi/iso} standard language for interacting with relational databases. 
+User input is in the form of a string in some encoding, typically {\sc ascii} or Unicode; the string is parsed by a context-free grammar, usually one generated by a parser generator such as yacc or bison.
+
+{\sc sql} offers tremendous flexibility, but this flexibility can be used against implementors as easily as implementors can use it against its problem domain. 
+In this section, we show how formal axioms of input validation give improved resistance to a certain class of attacks.
+
+\subsection{SQL Injection}
+{\sc sql} \emph{injection} is a class of attacks against software which interfaces with a database. 
+An innocuous-looking statement, such as {\tt "select rights from table where username =} $I_1$ {\tt and password =} $I_2${\tt "}, can be subverted with carefully-chosen $I_n$ values.
+
+For instance, if $I_1$ is {\tt "root"} and $I_2$ is the string {\tt "' or '1' = '1'"}, the final statement passed to the database is {\tt "select rights from table where username = 'root' and password = '' or '1' = '1'"}. 
+This malformed boolean expression evaluates to true, so root's rights will be granted.
+
+There are two possible regular-expression-based solutions to this problem: disallow certain characters from user input, or accept only certain patterns.
+(A third approach, \emph{prepared statements}, is similar to the second solution, but only applies to SQL and cannot be extended to other injection vectors such as XML/XPath injection.
+We will discuss this further shortly.)
+Most developers take the first approach, examining input strings for control characters such as {\tt "}, which terminates a string argument prematurely and allows an attacker to inject a malicious boolean substring; {\tt ;}, which terminates a query and allows an attacker to inject additional queries in the same input string; or {\tt --}, which is used to comment out the remainder of a line and bypass any remaining portion of the intended input which would constitute an unparseable fragment.
+The validator either rejects such strings or ``escapes'' those characters so that the database parser does not interpret them as control characters.
+In the second approach, a regular expression such as {\tt [a-zA-Z0-9]+} is assumed to cover all possible valid inputs.
+However, both approaches can reject valid inputs, and international encodings present problems which are far too unwieldy for regular expressions to handle in a reasonable fashion.
+
+(FIXME: expand on this here and reference the recent injection issue from http://www.postgresql.org/docs/techdocs.50)
+
+Having shown that regular expressions are inadequate to the task, we return to Theorems \ref{th:minval} and \ref{th:rangeval}, which say we need to validate over $\Sigma_I^S$ using a mechanism at least as strong as that used to generate valid command strings---in this case, a context-free grammar.
+
+We do not know what the user will input, and as such, predicting $\Sigma_I^S$ is problematic. 
+However, by Definition \ref{def:syntactic-context}, the upper bound of $\Sigma^S$ is the whole of $S$, so we elect to validate over the entire string.
+(In practice, this is entirely computationally feasible, since $S$ must be parsed by the {\sc sql} parser eventually anyway.)
+
+So far, we have not validated anything. 
+If we were to put the injected {\sc sql} string through an {\sc sql} parser, it would parse correctly.
+The string is syntactically valid, but it is semantically invalid, in that it does not achieve the intended results.
+Our challenge is to define a sublanguage of {\sc sql} which only generates strings that have our intended semantic meaning---``safe'' strings, in other words.
+
+Fortunately, defining a safe subset of {\sc sql} for this case is simple. 
+We define, using Lemma \ref{lem:sublanguage-construction}, a sublanguage of {\sc sql} wherein the only allowed production rules are those which generate the sequence {\tt "select rights from table where username='{\sc terminal-string}' and password='{\sc terminal-string}'"}, where {\sc terminal-string} denotes a sequence of characters which derives no keyword or other syntactic structure. (FIXME: this is a shitty way of phrasing that.)
+
+We take whatever command string $C$ we create from splicing together our command string and the user input, then determine whether $C$ is a valid string in our known-safe sublanguage, using a tool of sufficient computational strength to produce all valid strings in that sublanguage.
+In this case, since we know that a context-free grammar is sufficient to generate {\sc sql} strings and our known-good context-free sublanguage generates a subset of all possible valid {\sc sql} strings, we know that a pushdown automata is sufficient for our purposes.
+If $C$ is a valid string in our sublanguage, the input is good.
+If it is not, the input is bad.
+
+By so doing, we are secure against {\sc sql} injection attacks.
+
+\section{Dejector}
+\label{dejector}
+
+\section{Conclusions and future work}
+\label{conclusions}
+
+\subsection*{Acknowledgments}
+
+
+\end{document}
+ 
  
\ No newline at end of file

Modified: doc/trunk/dejector/llncs-rabbi.cls
===================================================================
--- doc/trunk/dejector/llncs-rabbi.cls	2007-03-22 10:51:20 UTC (rev 1764)
+++ doc/trunk/dejector/llncs-rabbi.cls	2007-03-22 12:28:12 UTC (rev 1765)
@@ -19,7 +19,7 @@
 %%   Right brace   \}     Tilde         \~}
 %%
 \NeedsTeXFormat{LaTeX2e}[1995/12/01]
-\ProvidesClass{llncs-rabbi}[2004/08/17 v2.14
+\ProvidesClass{llncs-rabbi}[2004/08/17 v2.14-1
 ^^J LaTeX document class for Lecture Notes in Computer Science]
 % Options
 \let\if@envcntreset\iffalse
@@ -105,6 +105,7 @@
 \def\propositionname{Proposition}
 \def\questionname{Question}
 \def\remarkname{Remark}
+\def\remarkname{Observation}
 \def\seename{see}
 \def\solutionname{Solution}
 \def\subclassname{{\it Subject Classifications\/}:}
@@ -1139,7 +1140,9 @@
 \spn@wtheorem{question}{Question}{\itshape}{\rmfamily}
 \spn@wtheorem{solution}{Solution}{\itshape}{\rmfamily}
 \spn@wtheorem{remark}{Remark}{\itshape}{\rmfamily}
+\spn@wtheorem{observation}{Observation}{\bfseries}{\itshape}
 
+
 \def\@takefromreset#1#2{%
     \def\@tempa{#1}%
     \let\@tempd\@elt

***********************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe freehaven-cvs       in the body. http://freehaven.net/
Prev by Author: [freehaven-cvs] r1764: Initial commit. (in doc/trunk: . dejector)
Next by Author: [freehaven-cvs] r1767: Minor cleanup. (doc/trunk/dejector)
Previous by thread: [freehaven-cvs] r1764: Initial commit. (in doc/trunk: . dejector)
Next by thread: [freehaven-cvs] r1766: Added a subsection formally defining injection, and natterin (doc/trunk/dejector)
Index(es):
- Author
- Thread