[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interview

"Philipp Gühring" wrote:
> Am Sam, 04 Sep 1999 hast Du wahrscheinlich geschrieben:
> >Steve Baker wrote:
> >
> >> A more elegant thing is to store a 'magic number' at the top of the
> >> file.
> >> ALL Binary files should start with a 4 byte magic number anyway so that
> >> you can check that you aren't being fed a file of the wrong kind - and
> >> also so that the Linux/UNIX 'file' command can be set to recognise your
> >> file type using /etc/magic.
> Ok, that sounds good. But there is something else we have to do too:
> Detecting the alignment holes.
> What if the write platform has a 4 Byte Alignment, and the Target has
> a 2 Byte alignment? (64 Bit Risc processors often have 4 Byte
> aligning)


It's *EXTREMELY* bad practice to do whole structure reads. You should
ALWAYS read element by element.  It's painful but utterly necessary.

> Let´s think about that too, then we can implement both techniques.
> I suggest the following thing:
> The first 5 Bytes are the Magic code. We should not use 1234, because
> someone else will have the same idea.

1234 was only an example...you should certainly pick something more

> We should use something that
> means Penguin-File-API. "PFAPI" would be an idea..
> Why 5 bytes? Because then there will be a hole for the next integer.
> We should use a short:
> typedef struct
> {
>    char magic[5];
>    short aligndetector;
>    any data ...
> } File_Structure;
> memset(FileStructure,0,sizeof(FileStructure));
> memcpy(Filestructure.magic,"PFAPI");
> File_Structure.aligndetector=0x1234;
> fwrite()...
> With this combination, we should be able to detect everything.


No way! For example: Some machines align 'double' on 8 byte boundaries,
others on 4, yet others on 2 and yet others on 1.

I have seen at least one compiler that changed the order of structure
elements in order to fit short variables into the padding gaps of
others (although I believe that's not strictly legal C)...eg:

   struct xxx
      short A ;
      int   B ;
      short C ;
   } ;

...gets stored in memory with B first, then A, then C.

Padding may be introduced to take structures that are close to
L1 cache-line boundaries up to the nearest...all sorts of
possibilities exist.

C++ programs may add hidden stuff like virtual function tables
that will be DISASTEROUS if read into another program (or even
another run of the same program).

Just don't do that - OK!  The problems it creates are almost
impossible to diagnose by people trying to port your code
(or even just compile it with some other compiler).

Even where the compiler can be told to not pad certain structures,
that often imposes severe performance penalties - and in any case
is simply not possible for some machine architectures.

Steve Baker                  http://web2.airmail.net/sjbaker1
sjbaker1@airmail.net (home)  http://www.woodsoup.org/~sbaker
sjbaker@hti.com      (work)