[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] special register map



hi,

cedric wrote:

>Hi,
>
>	I am currently starting adding a map for special register and if somebody can 
>clarify some of them it would be nice. Here is the list :
>	MAX_SIZE
>	SIZE_0
>	SIZE_1
>	SIZE_2
>	SIZE_3
>	MAX_CHUNK_SIZE
>
>Cedric
>
The SR map is, like the opcode map, defined only by symbolic addresses
before F-CPU v1. However, the map will evolve a lot more than the
opcode map in the future, so the files must always use the symbolic
names, not their numeric value.

These names and some preliminary values are defined
in the file f-cpu/configuration/f-cpu_config.vhdl.in in the snapshots.
I have attached it to this mail.

The SR map in not "cleanly" managed yet. The corresponding unit is
not even written. More work is needed to take the SR map out of
the "general" configuration files, maybe even with a tool that
creates a configuration file according to the user's needs.

In case one is lazy to read the vhdl.in file, here are some indications :
 - MAX_SIZE is the number of bytes per register.
 [ however the file is unclear : it seems to indicate that it is the
    log2 of the number of bytes. i would rather use this second version
    but we'll have to see what this involves in SW, because the
    natural value is used in the source. ]

- SIZE_0, SIZE_1, SIZE_2, SIZE_3
are the widths (in bytes) of the chunks
associated to the size flag in the instructions.

for example when add.0 is performed, the
value associated to the size flag is 8 bits
(SIZE_0 is 1 byte by default). This width can
be reprogrammed when more than 64 bits per register
is available, so it can mean 128 bits or 512 bits
if necessary.

- MAX_CHUNK_SIZE is the maximum size of a single
chunk. Currently is is hardwired to 8 bytes/register.

These SR may be hardwired or not, depending on
the version. Usually, MAX_SIZE  and MAX_CHUNK_SIZE
are hardwired (unless there is some instruction emulation).
The application reads MAX_SIZE to determine if more than
64 bits per register are available, and if needed, the
application modifies SIZE_0-3 to the required widths.
These are "private" registers which, when modifiable,
are saved across context switches.


The format is obvously a problem,  the sources indicate that
this is the raw number of bytes but the logarithm would be
more useful in practice. The values of SIZE_0-3 could fit
in 16 bits if the widths are stored in logarithmic way :
each 4-bit parts could represent widths from 8 to 256K bits,
more than enough for future CPU generations. This packed
16-bit word is stored in the CMB for the context switches.

Another point is that these registers are mainly used during
program setup, when loop counts and data widths are computed.
Using raw byte counts, the application has to divide the iteration
counts but using the fact that register width is always a power of two,
a simple shift is enough. Computing the log of a value is not very easy,
but computing the exponent is straight-foward (a simple shift).

For example, take a program that loops over an array of 256 bytes :
using SIMD and the F-CPU "abstract" model, the loop number is
computed with : L = 256 >> MAX_SIZE. If L is zero,
then a SIZE_x must be programmed to manage 256 bytes and L=1.
Then, the loop can run L times and each data is treated using the
size flag corresponding to SIZE_x.

Obviously, there is a problem but i have no time left to modify the sources.
I would like to meet cédric to speak about it with him.

YG
__FILE_HEADER__
-- source : f-cpu/configuration/f-cpu_config.vhdl.in
-- destination : f-cpu/vhdl/configuration/f-cpu_config.vhdl
-- (c) Yann GUIDON, oct. 21, 2000 <whygee@f-cpu.org>
--
-- v0.2 : Michael Riepe changed F_RANGE
-- v0.3 : YG specified the user-modifiable constants + GPL
-- v0.4 : MR proposed LOGMAXSIZE, YG added the ROP2 constants.
-- v0.5 : nov. 17, 2000, YG added SR_IRQ_BASE, SR_TRAP_BASE,
--         SR_SYSCALL_BASE, SR_URL etc.
-- v0.6 : nov. 26, 2000, YG moved some SR stuff to /VHDL/EU_sr
-- v0.7 : dec. 31, 2000, YG added SR_MAX_CHUNK_SIZE and SR_TLBMISS_BASE
-- v0.8 : aug. 19, 2001, YG hacked for m4 preprocessing.
--        run f-cpu/configure.sh to update this file.
-- v0.9 : aug. 28, 2001 : YG + MR modified some stuffs.
--        MR hinted the "eval(radix)" trick, status is satisfying.
-- v0.10 :dec. 31, 2001 : restored MAX_CHUNK_SIZE, SR_TLBMISS_BASE
--  and SR_LAST_SR which disapeared during the transition to m4 format.
-- v0.11 : june 2002 : adding a new type for the register addresses.
--
-- This package defines the "characteristic widths" of
-- the internal units. Please respect the restrictions.
--
-- #### The SR code should be moved to another file ! ####
--
-- **************************************************************
-- WARNING : All the user-modifiable values are defined in the 
-- f-cpu/configuration/f-cpu_user.m4 file.
-- **************************************************************
--
--  * LOGMAXSIZE : Log2 of the Size of the registers in bytes.
--  Can be any integer above or equal to 2. 2 corresponds to
--  a 32-bit implementation, 3 corresponds to a 64-bit version.
--  This is the most important parameter, the first with
--  which one can play. Be careful anyway. The 32-bit version will
--  not work yet.
--
--  * L1LogLines : Log2 of the NumBer of cache Lines (MUST be EVEN)
--  This parameter determines how many L1 cache lines are implemented.
--  It must be >=4 and _even_ because of the particular LRU mechanism
--  used for this prototype. Allowed values are 4, 6 or 8 (that is :
--  16, 64 or 256 lines, or 512 bytes, 2KB or 8KB). More would correspond
--  to a L2 cache... but are possible if you have enough ressources.
--
--  * L1ABwidth :Address Bus width, in 32-byte chuncks (32+5=128GB)
--  This determines the width of the address comparator of every L1
--  cache line. Be careful, too many bits might require a LOT of ressources.
--  A reasonable value for a small design would be 16 (2MB of adressable
--  physical memory), adapt as required. Warning : this parameter
--  also determines how many address bits are physically implemented.
--

LIBRARY ieee;
    USE ieee.std_logic_1164.ALL;

package FCPU_config is

------------------------------------------------------
-- Most important F-CPU constants :
------------------------------------------------------

-- Number >=2, 3 corresponds to 64-bit registers
  constant LOGMAXSIZE : natural := __DEF_LOGMAXSIZE;
    -- defined in f-cpu/configuration/f-cpu_user.m4

-- Size of the registers in bytes
  constant MAXSIZE : natural := 2**LOGMAXSIZE;

-- Size of the registers in bits.
  constant UMAX : natural := MAXSIZE * 8;

-- Range of a register width declaration.
  subtype F_RANGE is natural range UMAX-1 downto 0 ;

-- shortcut for a very common declaration.
  subtype F_VECTOR is std_ulogic_vector(F_RANGE) ;

-- MAX_CHUNK_SIZE in bits. This should not change.
  constant MAX_CHUNK_SIZE : natural := 64 ;

------------------------------------------------------
-- Definition of a register address :
------------------------------------------------------

-- defines a 6-wire address :
   subtype t_reg is std_ulogic_vector(5 downto 0);
   -- moved from f-cpu/vhdl/scheduler/scheduler_definitions.vhdl

-- defines the integer value thereof :
   subtype reg_number is natural range 0 to 63;

------------------------------------------------------
-- Some architectural constants, bound to FC0 only :
------------------------------------------------------

------------------------------------------------------
-- L1 Caches (split instructions and data)
------------------------------------------------------

-- Data Bus width, or width of each cache line (32 bytes)
  constant L1DBwidth  : natural := 256 ;

-- Address Bus width, in 32-byte chuncks (32+5=128GB)
  constant L1ABwidth  : natural := __DEF_L1ABwidth ;

-- Log2 of the NumBer of cache Lines (MUST be EVEN)
-- (small number for the first attempts)
  constant L1LogLines : natural := __DEF_L1LogLines ;

-- NumBer of cache Lines (2**L1LogLines)
  constant L1NBlines  : natural := 2**L1LogLines ;

------------------------------------------------------
-- The Special Registers : (adapted from SR.h)
--
-- (please check f-cpu/configuration/f-cpu_sr.m4 !)
--
-- What the user should modify when implementing the core :
-- * SR_NUMBERS_val  should be updated when the
--     number of implemented SR changes.
-- * SR_FAMILY_val   specifies the type of core (FC0, FC1 etc).
--     This is meant to be used for selecting particular code
--     sections that are optimized for certain cores.
-- * SR_STEPPING_val specifies the mask revision, for example.
-- * SR_URL_val contains the Internet URL where the source, 
--     software and documentation are stored (64 char max.)
--
-- DO NOT MODIFY the other constants unless the specifications
-- change. New SRs will appear soon. Stay tuned.
------------------------------------------------------

-- last SR :
  constant SR_LAST_SR      : natural := 27;

-- number of SRs that are implemented in this model
  constant SR_NUMBERS      : natural := 0;
  constant SR_NUMBERS_val  : natural := SR_LAST_SR;

-- F-CPU core number. remark : 0xFC0 = 4032d :-)
  constant SR_FAMILY       : natural := 1;
  constant SR_FAMILY_val   : natural := __DEF_SR_FAMILY_val;

-- revision/implementation
  constant SR_STEPPING     : natural := 2;
  constant SR_STEPPING_val : natural := __DEF_SR_STEPPING_val;

-- in bytes, a power of two >= 3
  constant SR_MAX_SIZE     : natural := 3;
  constant SR_MAX_SIZE_val : natural := MAXSIZE;

-- Size attribute 0, hardwired if SR_MAX_SIZE <= 8
  constant SR_SIZE_0       : natural := 4;
  constant SR_SIZE_0_val   : natural := 1;

-- Size attribute 1, hardwired if SR_MAX_SIZE <= 8
  constant SR_SIZE_1       : natural := 5;
  constant SR_SIZE_1_val   : natural := 2;

-- Size attribute 2, hardwired if SR_MAX_SIZE <= 8
  constant SR_SIZE_2       : natural := 6;
  constant SR_SIZE_2_val   : natural := 4;

-- Size attribute 3, hardwired if SR_MAX_SIZE <= 8
  constant SR_SIZE_3       : natural := 7;
  constant SR_SIZE_3_val   : natural := 8;

-- SIMD chunck size, hardwired.
  constant SR_MAX_CHUNK_SIZE     : natural := 8;
  constant SR_MAX_CHUNK_SIZE_val : natural := MAX_CHUNK_SIZE/8;

-- R/W, Value is dynamic, incremented every cycle.
  constant SR_CYCLE        : natural := 9;

-- Protected, R/W, Controls the paged memory.
  constant SR_PAGING       : natural := 10;

-- Protected, R/W, general status and mode bits which control other enable bits.
  constant SR_CONTROL      : natural := 11;

-- IRQ, TRAP and SYSCALL jump tables : all are R/W in protected mode - only.
  constant SR_IRQ_BASE     : natural := 12;  -- For the hardware interrupt requests
  constant SR_IRQ_SIZE     : natural := 13;
  constant SR_TRAP_BASE    : natural := 14;  -- faults and system errors
  constant SR_TRAP_SIZE    : natural := 15;
  constant SR_SYSCALL_BASE : natural := 16;  -- OS system call
  constant SR_SYSCALL_SIZE : natural := 17;
  constant SR_TLBMISS_BASE : natural := 18;  -- TLB miss trap

-- The URL registers must be modified to point to the manual, sources, patches etc.
-- The registers are hardwired, format is ASCII and padded with 0s.
  constant SR_URL          : natural := 19;
-- 64 characters max. for a 64-bit version, 32 chars for a 32-bit version...
  constant SR_URL_size     : natural := 8;
  constant SR_URL_val      : string  := __DEF_SR_URL_val;
    -- defined in f-cpu/configuration/f-cpu_user.m4

-------------------------------------------------------
-- The ROP2 unit : these constants specify the
-- correspondance between the binary code and the actual
-- operation. These data are copied here for convenience
-- only, for example if you want to make an assembler in
-- VHDL. Check the file ROP2.vhdl for more informations.
--------------------------------------------------------

  constant ROP2_DIRECT_MODE : std_ulogic_vector(1 downto 0) := "m4_eval(__DEF_ROP2_DIRECT_MODE,2,2)";
  constant ROP2_AND_MODE :    std_ulogic_vector(1 downto 0) := "m4_eval(__DEF_ROP2_AND_MODE,2,2)";
  constant ROP2_OR_MODE :     std_ulogic_vector(1 downto 0) := "m4_eval(__DEF_ROP2_OR_MODE,2,2)";
  constant ROP2_MUX_MODE :    std_ulogic_vector(1 downto 0) := "m4_eval(__DEF_ROP2_MUX_MODE,2,2)";

  constant ROP2_AND   : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_AND,2,3)";
  constant ROP2_ANDN  : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_ANDN,2,3)";
  constant ROP2_XOR   : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_XOR,2,3)";
  constant ROP2_OR    : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_OR,2,3)";
  constant ROP2_NOR   : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_NOR,2,3)";
  constant ROP2_XNOR  : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_XNOR,2,3)";
  constant ROP2_ORN   : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_ORN,2,3)";
  constant ROP2_NAND  : std_ulogic_vector(2 downto 0) := "m4_eval(__DEF_FUNCTION_NAND,2,3)";

  constant ROP2_VALUE_AND   : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_AND,2,4)";
  constant ROP2_VALUE_ANDN  : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_ANDN,2,4)";
  constant ROP2_VALUE_XOR   : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_XOR,2,4)";
  constant ROP2_VALUE_OR    : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_OR,2,4)";
  constant ROP2_VALUE_NOR   : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_NOR,2,4)";
  constant ROP2_VALUE_XNOR  : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_XNOR,2,4)";
  constant ROP2_VALUE_ORN   : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_ORN,2,4)";
  constant ROP2_VALUE_NAND  : std_ulogic_vector(3 downto 0) := "m4_eval(__DEF_FUNCTION_VALUE_NAND,2,4)";

end FCPU_config;


package body FCPU_config is

-- The use of the ilog functions is not recommended inside
-- the synthesisable processes, they are provided for
-- convenience only. Usually, the logarithm is provided
-- and exponentiation is performed on it (it's much simpler).

-- integer logarithm (rounded up) [MR version]
function ilog (x : natural; base : natural := 2) return natural is
  variable y : natural := 1;
begin
  while x > base ** y loop
    y := y + 1;
  end loop;
  return y;
end ilog;

-- integer logarithm (rounded up) [YG version]
-- i wonder if there is an off-by-1 error... ?
function ilog2 (x : natural) return natural is
  variable y, z : natural := 1;
begin
  while x > z loop
    y := y + 1;
    z := z + z;  -- you can notice the "little" enhancement :o)
  end loop;
  return y;
end ilog2;

------------------------------------------------------
-- Some useful wrappers or functions could be included
-- here if they are necessary for rest of the project.
------------------------------------------------------

end FCPU_config;