[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] spec draft about booting F-CPU



hello,

this is a first version.
comments, enhancements, flaws... can be discussed on this list.

have a nice day,
YG
http://f-cpu.seul.org/new/F-CPU_boot.txt
created Sat Oct 12 01:15:53 CEST 2002 by whygee@f-cpu.org


************** In case you didn't know ************** 

F-CPU is a set of specifications that describe a
family of microprocessors and their reference implementations.
The Freedom CPU project is a community of volunteers that
work on defining these specifications, write the reference
source code, develop all the necessary files in order for
F-CPU to become a serious and long-lasting alternative
to the existing microprocessor families.


************** Introduction ************** 


This file contains a preliminary overview of the mechanisms
used by F-CPU for "booting".
 - It covers BIST-time, monitor-time and kernel setup-time
   libraries, communications and troubleshooting.
 - It applies to a single CPU with a minimum number of external
   devices. Multi-CPU booting is not addressed yet and it must be
   handled at a higher level.
 - The goal of this "spec" is to be very simple to understand,
   to implement and to use on any implementation of F-CPU, whether
   as a software simulation, FPGA, Full-Custom and whatever the core
   family (not limited to FC0)
 - It covers a "least common denominator" way to start an
   operating system, providing enough features to allow extensions.


************** Requirements **************


This being said, here are the minimum requirements.
This should be relatively easy to get, either in software,
FPGA or ASIC.
  - a working F-CPU core (huh, not ready yet)
  - one or more external RAM interface
     (typically, SDRAM or any available or necessary technology)
  - EEPROM (probably FLASH with any necessary controller
     to handle fine-grained access)

The interconnexion is not specified. However the access
to the EEPROM must be transparent (to not require any preliminary
configuration).

In short, to implement this specification, you need
only a few components that are easy to find and assemble. 
A FPGA starter kit's board should contain them and
it's also the common parts of a minimum "F-CPU module".

Finally, these different implementations should only
need a single common debug and troubleshooting tool
(this also reduces the coding efforts).


************** boot-time I/O channel **************


Starting an operating system (even minimal)
on F-CPU requires 3 steps after power-up and/or reset :
 - BIST
 - initialisation from EEPROM
 - kernel initialisation and startup

Before this finishes, there is _no_ way to communicate
with outside devices or user, as there is no other device
that has been initialised. Support of peripherals is not
to be standardised and it is unwanted, as it would bloat
the EEPROM and make this spec so complex that it
would make it unimplementable. Support of video, disk,
keyboard or network must be handled by additional,
user-provided software, as these peripherals might not
be implemented or can evolve.

Yet, the 3 "init" steps require a means to report status
and get commands from external tools (when necessary).

This can be achieved by mapping a very
simple character-based interface in the Special Register
("SR") space. This avoids the use of memory-mapped
communication (difficult to track with an external probe,
since the F-CPU core uses to cache things a lot)
and is straight-forward to implement. It also remains
independent from the external architecture and its
evolutions.

2 SRs are created :
[RO, 2 bits] SR_CONS_STATUS : contains handshaking bits
[RW, 8 bits] SR_CONS_DATA   : where the CPU reads or puts a byte.

The chosen protocol is a handshake with some limited
HW support. After BIST is successful, these special
registers are reset to 0.

The protocol is the same in both direction :
the "sender" waits for the "data ready" flag
to be cleared, then writes a byte in the
data register. This action sets the "data ready"
flag. The "receiver" waits for this flag to
be set, and reads the data register : this
resets the flag to 0.

The set and reset are handled in hardware,
thus reducing the protocol complexity a bit.

Data in SR_CONS_DATA are "multiplexed" (from the
core's point of view) :
reading SR_CONS_DATA always returns the contents
of the receive buffer, writing to it writes to the
output buffer. These two buffers are independent
and have a single handshake flag each. The two
handshake flags are visible from both ends of the
channel.

                  |
              ----|----< DIN
              |   |
INTERNAL _____|   |
DATA BUS      |   |
              ----|----> DOUT
                  |

From the F-CPU core's point of view, this
is used with few instructions :

read_char :
  loopentry r1;                define the start of the wait loop
    get SR_CONS_STATUS, r2;    read the handshaking flags
    andi 1, r2, r2;            isolate the "data in ready" flag
  jmp.0 r2, r1;                if nothing ready, try again
                               (ok, i could have used the LSB condition)
  get SR_CONS_DATA, r2;         read the input character (and clear the flag)

write_char :
  loopentry r1;                define the start of the wait loop
    get SR_CONS_STATUS, r2;    read the handshaking flags
    andi 2, r2, r2;            isolate the "data out ready" flag
  jmp.1 r2, r1;                if buffer busy, try again
  put SR_CONS_DATA, r3;        write the output character (and set the flag)

This code implies that :
 - SR_CONS_DATA is 8-bit only
 - bit 0 of SR_CONS_STATUS is the "data in ready" flag
    and it is set to 1 when there is something to read
 - bit 1 of SR_CONS_STATUS is the "data out ready" flag
    and one can write when it is cleared (0).
Other behaviours are undetermined and you are not
encouraged to play with them (though i guess that
this protocol will be enhanced, but it will loose
its simplicity).

The protocol is roughly the same for the "host",
or whatever is connected to the other end of the channel.

From there, it is easy to write some more code that handles
character buffers like a UNIX console, or whatever.
Adding support for a timer will provide asynchronous
communications, but it is out of the scope of this spec :
the most important goal is that any software can
interact with a user, or at least display booting
information, before the classical I/O peripherals
are initialised by an operating system.

The hardware implementation is very simple from
the SR side. It provides a 8+8+2-bits interface
to the outside world, which can then be transmitted
to a host using many kinds of links, including :
 - "parallel printer port" cable
 - RS-232
 - JTAG
 - a named pipe or a /dev entry (thus providing a single
   interface to simulated, emulated or built versions)
or it can simply remain disconnected. Otherwise,
it provides a simple means to
 - output boot messages
 - debug low-level drivers
 - select a kernel (if a multiboot utility is written)
 - upload or download kernel images to/from EEPROM
 - or simply connect a dumb alphanumeric LCD + keypad

Now that we can examine and control the CPU's activity,
let's proceed to the real stuff : booting to some kernel,
or whatever.


************** boot environment **************


Among all the golden rules that are necessary for
F-CPU to never suffer from compatibility issues, one
is : to never define a fixed memory address map.
If there was a definition of a device mapped at
a certain address, then the addition of devices
would make software and hardware more complex in
the future. All device mapping addresses and the
control registers are mapped in the Special Register
space, which does not communicate with the memory
addressing space, thus ensuring fine-grained
protection, simplifying the address decoders and
keeping the pipelines from complex interactions
with some configuration changes. Using SRs for
defining the address map also helps when devices
are hotswapped, for example.

There is one exception, though : the instruction
stream must start somewhere in the memory space
and it is logical to start at address 0.
Some architectures start at 0xFFFFFsomething,
but F-CPU pointers have no MSB. Starting at
address 0 ensures that any F-CPU compliant core
can boot the same code without porting effort.

The EEPROM is mapped from address 0x0 and there
is no size limit. However the code it contains
must know this size. No need to mention that
all the protection bits are cleared and all
the resources are available to the boot code.

Another important fact concerning the booting
environment is that upon booting, the core
has no other temporary storage location than the
register set. The EEPROM is read-only at this
time, the cache is probably working but useless,
and the external private RAM is not initialised.
So the first thing to do, when starting at
address 0, is to configure and initialize the
RAM controller(s). It depends a lot on the
available technology so it won't be described
precisely. Let's hope that it is not too complex,
though, so the 63 usable registers are enough.

The boot code must detect the available SDRAM
controllers through the SRs. There can be more
than one SDRAM ports and some parts can be
performed in parallel (for example when scanning
the chips for integrity checking).

For each controller, the boot code reads the
HW parameters off the SDRAM chips and configures
the controller to match these : size, number of banks,
wait cycles, interleaving, precharge and most
importantly : the base address. If there are
several controllers, the base addresses must
be contiguous.

Then, turn the Dcache off and start writing
and reading the RAM to check its integrity.

Both the involved SR and the boot code can
change in the future so the priority is to
keep the interface clean and simple, rather
than add more and more definitions. The goal
is to have a cachable memory area at the end
of this process.

However i have mentioned in the introduction
that this document doesn't address multi-CPU
configuration. But the base address of the
private memory areas must not collide with other
processors ! There are several workarounds :
  - include the multi-CPU setup in the boot code
     ==> this would be superfluous because it's
     the kernel's job, and the boot code would
     become overly large.
  - assign a unique CPU number to each processor in the
    system (à la SHARC) to compute the base address
     ==> there would be collisions or holes if the system
     is not heterogeneous (not the same amount of RAM
     for all CPUs) and we would like the memory space
     to be contiguous (all the RAMs form a unique block)
  - include the memory configuration in the EEPROM
     (the base address would be computed by the kernel,
     then written to EEPROM) ==> the system configuration
     could change between 2 boots and would force to
     recompute the addresses (though it's unlikely)
     ==> Another problem is that the boot EEPROM could
     be used and read by several CPU at a time, the cores
     can't boot in parallel at the risk of mapping their
     RAMs to the same address --> boot must be serialised
  - some inter-CPU communication channel could be created
     and mapped to the SR  ==> the protocol could be too
     complex and not portable, as it depends on the system's
     topology and the available HW
Choose your camp, according on the system's design and
environment.


************** bootstrapping some software **************


Now, the CPU can access the EEPROM and a contiguous area
of faster RAM. The most important core's features are
configured (IRQs are off, protection is disabled, etc.)
and it's time to dig what's left in the EEPROM.

All this process can be punctuated by messages sent to
SR_CONS_DATA. This means that the EEPROM has some code
that knows how to do this, a kind of a library that manages
a dumb microconsole. To make life easier, this code can
be reused by some other parts of the software remaining
in the EEPROM.

Another library manages the allocation of blocks in
the private RAM. It's a kind of low-level malloc() and
free() that can be used by other software, and that can
use the microconsole code, for example, to output debug
messages.

The last library is a set of ROMFS-like low-level handling
routines that can read files inside a simplified file system
located in the EEPROM. This library requires malloc()
functions provided by the memory library in order to load
"files" to the RAM before executing binaries from there
(the files in the F-ROMFS can be unaligned to save some space,
and 256-bit versions will be rather stong about code
and data alignment). The FROMFS code has been started
already, but is not complete.

All these things will certainly require trap handlers
during development and debug, for example to catch
invalid addresses, invalid opcodes or alignment faults
to name a few possible coding errors. They can be removed
later but are recommended in case a binary, as run off
FROMFS, can contain flaws => the user will be happy to
know why the computer hangs. These handlers can be
replaced by other code before or when the kernel installs
itself.

When the RAM initialisation is completed, the code calls
a function from the FROMFS library, asking to execute a
file off the EEPROM, for example "runme.first". Then,
the user's choice prevails.

To sum up : the EEPROM contains 5 parts :
 - the initialisation code (init trap handlers+IRQ...,
      init SDRAM controller(s), then call FROMFS code)
 - the message printing/reading library
 - the memory allocation library
 - the fromfs library
 - the fromfs image

The 4 first parts are provided by the F-CPU project, as well
as some debugging tools and fromfs image manipulation software.
They come more or less in that order. Since they are provided
by F-CPU, they are "free software" and can be compiled by any user,
so the entry points are known and can even be controlled.
This means that the addresses of the functions depend on the
version of the software, but it's not important because the
symbol table can be easily exported and reused during the
kernel's compilation.


************** other software **************


The FROMFS specification is described in a different file
and it can change anyway, but it is primarily a dumb
file system : each "file" is described with an entry
in the file table. A name, some attributes, a size and
an offset in the image are the minimal properties.

From there, the provided FROMFS library can locate,
open and seek into a file given its name. Using the malloc
library, a file can be loaded into RAM and then executed.
This software can in turn execute some other software
located in the FROMFS, load data files, allocate more
memory, communicate with the user through the microconsole
and more importantly : add new features and detect more
devices to extend the reach of the software.

The first possibility is to simply link a Linux kernel
with the existing "libraries". The first messages will
be output to the microconsole and the kernel will enumerate
all the known devices before redirecting the messages to
them. The rest of the story is well known. The same works 
with microkernels as well. Just name the kernel as
"runme.first" or hardlink it.

Another possibility is to choose between several kernels
or kernel parameters, with software like GRUB or LILO.
This would use the provided facilities to select the
boot parameters and fetch the correct "file" from the
EEPROM. The multiboot utility would be hardlinked to
"runme.first" and each kernel image can keep a distinct
name.

However, in case no "microconsole" is connected, this
might be less practical than expected. Some HW detection
software, or "device driver", must be installed to allow
GRUB to use the screen, the keyboard and any mass storage
device. Then the device driver would be named "runme.first"
but it is getting a bit complex now !

Some basic command or script interpreter can be programmed
to run the desired software, in the order specified in a
"file".

Another eventuality is to exploit the microconsole as a
communication link, and download an image to execute.
Though it's a bit slow (the link is not designed for
high-speed communication, with a maximum of 1M bit/s)
it can spare some FLASH room in a large multi-CPU system
or it can be used when developping new kernels (instead
of writing to the EEPROM each time).

There are certainly other possible uses, it is even
possible to design a boot system like these of SPARC
or ALPHA, but if this is not needed, it still works.


************** conclusion **************


This specification is very important both for the
software and hardware development of the F-CPU project,
which is still in its infancy and not completely determined.

Defining a minimal "console port" and the necessary SRs
is important when designing the core, this specification
can influence the existing files but much care is taken
to avoid any impact.

The definition of the bootstrap procedure is also critical
for the development of the first SW tools : simulator,
emulator, debuggers, compilers and so much more. Making
these tools independent from the target (SW or HW), providing
a flexible, powerful and simple interface for booting a CPU,
lowers the coding efforts and makes it suitable for
more applications, but this keeps the architecture
independent from them and can evolve without compatibility
issue.

Finally, these guidelines are open enough so that
sombeody can code and boot whatever software he wants, whether
it is a monolithic kernel, a microkernel, a custom
application or simply a toy software.