[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interview

Bert Peers wrote:
> Steve Baker wrote:
> > A more elegant thing is to store a 'magic number' at the top of the
> > file.
> > ALL Binary files should start with a 4 byte magic number anyway so that
> > you can check that you aren't being fed a file of the wrong kind - and
> > also so that the Linux/UNIX 'file' command can be set to recognise your
> > file type using /etc/magic.
> This sounds interesting; since I don't know zip about unix files, I wonder
> if this is actually a standard you're elaborating on, or just something you're
> proposing for PFile only ?  I mean, is every Unix file starting with this kind
> of magic number ? 

Every binary file is *supposed* to.  But it's not enforced anywhere.

> And is it a convention to make sure that swapping
> is always detectable ?

No - that's something *I* came up with when parsing some kinds of
binary file that exist with both endianness.

> (ie restrict the numbers to shorts where the 4 bytes
> make up a strictly increasing sequence or something)

No - don't use a short - that was just an example to make it easier
to understand.  Most people make their 4 byte magic number be three
"repesentative" ASCII characters plus a fourth byte - which (for
various reasons) is best with the most significant bit set.

If you look in /etc/magic, you'll find it knows about the magic
number conventions for hundreds of obscure file formats.

Some formats only start with two bytes of magic number. SGI image
files (for example) only have 0x01 and 0xDE to identify them.

Quite a few Windoze file formats work like that too. ".wav" files
start with 'RIFF', ".bmp" files start with 'BM'
> > When you read a file, first read the magic number. If you see 0x1234
> > don't
> > byte swap - if you see 0x3412, you need to swap.
> > The joy of this is that you don't have to byte swap when WRITING files,
> > so it halves the amount of work you have to do.  Hence, since each
> > machine
> > writes files in it's native format, but can read either swapped or not.
> Neat !!
> Note that there's no need to use a file ofcourse to merely detect swapping;
> from the Quake2 source :
> void Swap_Init (void)
> {
>  byte swaptest[2] = {1,0};
> // set the byte swapping variables in a portable manner
>  if ( *(short *)swaptest == 1)

That's a way to figure out whether the computer you are running
on is big or little endian. It doesn't tell you whether or not
you have to swap...you could have a little endian machine reading
a big endian file or vice versa...the point is that you don't care
which case you have.

With my approach, you don't care - you just look at the file - and
if it needs swapping, you swap it.

Steve Baker                  http://web2.airmail.net/sjbaker1
sjbaker1@airmail.net (home)  http://www.woodsoup.org/~sbaker
sjbaker@hti.com      (work)