[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: gEDA-user: [RFC 3/6] Embedding system revamp
Peter Clifton quite rightly pointed out that my specification for the
'E' object syntax neglected the fact that while the gEDA file format
requires LF characters to be used, the MIME spec. requires CRLF line
endings.
So this is an alternative specification for the embedded object syntax
which addresses the problem by defining our own encodings.
File format changes
===================
Embedded objects
----------------
An embedded file has the following syntax:
E <filename> <type> <encoding>
<start-boundary>
<data>
<end-boundary>
The <filename> is the filename of the file being embedded. The <type>
would be the Internet media type [1,2] of the file, and the optional
<encoding> is the encoding used to store the <data>. If the <encoding>
is not specified, the "literal" encoding is assumed. The <type> and
<encoding> are case-insensitive.
<start-boundary> has the form:
[ <token>
and <end-boundary> has the form:
<token> ]
Where the <token> is a sequence of UTF-8 [3] characters that does not appear in
<data>. The <start-boundary> and <end-boundary> determine the extents of
the <data> (this avoids the use of a "size" or "number of lines" field).
"Literal" encoding
------------------
The "literal" encoding is intended for embedding of text data (for
example, gEDA schematics or symbols). The text must consist of only
valid UTF-8 characters. The NUL character (U+0000) must not appear in
the text.
A line feed character (U+000A) must always be appended to the text, and
must always be stripped when the file is parsed. [*]
"Base-64" encoding
------------------
The "base-64" encoding is intended for embedding of non-text data or of
text which contains the NUL character. The RFC 4648 Base64 encoding
algorithm [4] is to be used, with the following modifications:
1. When encoding the data, space (U+0020) and line feed (U+000A)
characters may be inserted anywhere in the encoded data. A line feed
character must be appended to the encoded data.
2. When decoding the data, any space or line feed characters must be
ignored. If any other non-encoding characters are present in the
encoded data, the data must be discarded.
The same base 64 alphabet as currently used for encoding image files
should be used. [**]
[*] This is so as to ensure that the <end-boundary> always appears on a
line by itself in the schematic or symbol file.
[**] The idea is to use the current Base64 codec without modification.
[1] "Multipurpose Internet Mail Extensions (MIME) Part Two:
Media Types", RFC 2046, http://tools.ietf.org/html/rfc2046
[2] "MIME Media Types", http://www.iana.org/assignments/media-types/
[3] "UTF-8, a transformation format of ISO 10646", RFC 2279,
http://tools.ietf.org/html/rfc2046
[4] "The Base16, Base32 and Base64 Encodings", RFC 4648,
http://tools.ietf.org/html/rfc4648
--
Peter Brett
Electronic Systems Engineer
Integral Informatics Ltd
_______________________________________________
geda-user mailing list
geda-user@xxxxxxxxxxxxxx
http://www.seul.org/cgi-bin/mailman/listinfo/geda-user