[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"

To: isis@xxxxxxxxxxxxxx
Subject: Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
From: teor <teor2345@xxxxxxxxx>
Date: Tue, 13 Feb 2018 11:03:54 +1100
Cc: nickm@xxxxxxxxxxxxxx, atagar@xxxxxxxxxxxxxx, teor@xxxxxxxxxxxxxx, tor-dev@xxxxxxxxxxxxxxxxxxxx
Delivered-to: archiver@xxxxxxxx
Delivery-date: Mon, 12 Feb 2018 19:04:19 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=9iTQ2/xUDXsU4QpyG6WWv/C/UdIbrcVVeC8pkoXd130=; b=Y0y4gnbS0oCPyMMBusvEmOJnVCP5NReZoEVTn5TGDEKVNSmberBPCYcWRORl/zb/8L riAnZwdyyF4TyQ07ZCEyx8sdVmjRJTR0AH2wuUN/hNb687fneIufI7fTd+k4FQSAyzB3 5JKXoh2g/slKly2t841usXxsX3PN32nab5b8q0fEufdD0mCMgwC4bzDE4IA2lIBZUVZv oBSmlLcKA8LcLF0gqh+tQSiTmomOcMzys0NyahX0ff1mbMu0jGKnS+JomM/eAVf2s9vI Wl5QCz1D7yjkHueMvjI21dHYkucTwLOiv/VfSyDqzbVcKxXiHMCj4ATbqgFgQesHQD/l Z/1w==
In-reply-to: <20180212235522.GA28876@patternsinthevoid.net>
List-archive: <http://lists.torproject.org/pipermail/tor-dev/>
List-help: <mailto:tor-dev-request@lists.torproject.org?subject=help>
List-id: discussion regarding Tor development <tor-dev.lists.torproject.org>
List-post: <mailto:tor-dev@lists.torproject.org>
List-subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=subscribe>
List-unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-dev>, <mailto:tor-dev-request@lists.torproject.org?subject=unsubscribe>
References: <20171209011708.GG1550@patternsinthevoid.net> <20180129200717.GC1368@patternsinthevoid.net> <20180129203631.GE1368@patternsinthevoid.net> <20180205174300.GK28008@patternsinthevoid.net> <20180205201643.GM28008@patternsinthevoid.net> <20180212235522.GA28876@patternsinthevoid.net>
Reply-to: tor-dev@xxxxxxxxxxxxxxxxxxxx
Sender: "tor-dev" <tor-dev-bounces@xxxxxxxxxxxxxxxxxxxx>

> On 13 Feb 2018, at 10:55, isis agora lovecruft <isis@xxxxxxxxxxxxxx> wrote:
> 
> A couple outcomes of this:
> 
> 1. What passes for "canonicalised" "utf-8" in C will be different to
>    what passes for "canonicalised" "utf-8" in Rust.  In C, the
>    following will not be allowed (whereas they are allowed in Rust):
>        - NUL (0x00)
>        - Byte Order Mark (0xFEFF)

I want to clarify this point:

The Byte Order Mark is Unicode Scalar 0xFEFF, encoded in UTF-8 as the
bytes 0xEF 0xBB 0xBF.

Tor's C and Rust implementations of UTF-8 must be identical.

When we write the C implementation, we must reject NUL for
compatibility with C string functions.

When we write the Rust implementation, we must reject NUL for
compatibility with the C implementation. (Rust already implements
UTF-8 strings that accept NUL, so this will require custom code).

When we write the C and Rust implementations, we must reject BOM
because it's unnecessary. Rejecting BOM is recommended by the
relevant standard. (Rust already implements UTF-8 strings that accept
BOM, so this will require custom code).

T
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

References:
- Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
  - From: isis agora lovecruft
- Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
  - From: isis agora lovecruft
- Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
  - From: isis agora lovecruft

Prev by Author: Re: [tor-dev] Proposal: only parse .torrc files in torrc.d directory
Next by Author: Re: [tor-dev] Enhancement for Tor 0.3.4.x
Previous by thread: Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
Next by thread: Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
Index(es):
- Author
- Thread