> On 14 Feb 2018, at 11:03, Damian Johnson <atagar@xxxxxxxxxxxxxx> wrote: > >> For the metrics tools there are some guidelines on this we can follow: >> https://docs.oracle.com/javase/tutorial/i18n/text/design.html. The other >> language would be Python (for stem), but Python developers have probably >> got a good understanding of unicode/str/bytes by now. (In Python 3: when >> using UTF-8, BOM will not be stripped and will be interpreted as data, >> and you can have a NUL in a str). > > Hi Iain. Actually, for Stem I'm really looking forward to this too. > Stem has special handling for the contact and platform fields (iirc > the only spot non-ascii content can presently appear). Stem's parsers > and API will be simplified once everything is uniformly utf-8. :P > > Possibly a stupid question but any reason not to require the whole > descriptor document to be printable characters? Requiring printable ASCII throughout the document means that people can't spell their names and email addresses correctly in contact lines. Requiring printable unicode introduces a dependency on a particular unicode version, because we don't know if unallocated blocks will be printable or not. I think we could make platform lines printable ASCII without losing much. Unless there are platforms that have non-ASCII names? T -- Tim Wilson-Brown (teor) teor2345 at gmail dot com PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B ricochet:ekmygaiu4rzgsk6n ------------------------------------------------------------------------
Attachment:
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ tor-dev mailing list tor-dev@xxxxxxxxxxxxxxxxxxxx https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev