[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"
> On 13 Feb 2018, at 10:55, isis agora lovecruft <isis@xxxxxxxxxxxxxx> wrote:
>
> A couple outcomes of this:
>
> 1. What passes for "canonicalised" "utf-8" in C will be different to
> what passes for "canonicalised" "utf-8" in Rust. In C, the
> following will not be allowed (whereas they are allowed in Rust):
> - NUL (0x00)
> - Byte Order Mark (0xFEFF)
I want to clarify this point:
The Byte Order Mark is Unicode Scalar 0xFEFF, encoded in UTF-8 as the
bytes 0xEF 0xBB 0xBF.
Tor's C and Rust implementations of UTF-8 must be identical.
When we write the C implementation, we must reject NUL for
compatibility with C string functions.
When we write the Rust implementation, we must reject NUL for
compatibility with the C implementation. (Rust already implements
UTF-8 strings that accept NUL, so this will require custom code).
When we write the C and Rust implementations, we must reject BOM
because it's unnecessary. Rejecting BOM is recommended by the
relevant standard. (Rust already implements UTF-8 strings that accept
BOM, so this will require custom code).
T
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev