[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[f-cpu] ROP2 unit
hi,
here is yet another ROP2 unit version :-)
i have merged the MUX4 (which was done with explicit
boolean operations) with the FANOUT loop.
I will probably write another, more flexible
testbench for the unit and maybe start to write
vectors to put in the BIST unit.
The rest doesn't change : rop2_xbar.vhdl and the
other files are the same.
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--------------------------------------------------------------------------
-- f-cpu/vhdl/eu_rop2/rop2_unit.vhdl - ROP2 Execution Unit for the F-CPU
-- Copyright (C) 2000-2001 Yann GUIDON (whygee@f-cpu.org)
--
-- v0.2: Michael Riepe reorganized the main for-generate loop
-- + corrected the lookup table (wrong op for ORN)
-- v0.3: YG replaced UMAX/8 with MAXSIZE :-)
-- v0.4: 11/17/2000, YG wants to rewrite the unit with MR's gate library ...
-- -> abandonned. we stick to high-level coding.
-- v0.5: 8/12/2001, YG modifies the interface, the names, adds MUX,...
-- Sun Aug 12 01:16:11 2001: still untested but it includes
-- the latest updates to the FC0 core.
-- Tue Aug 21 08:45:16 2001: trying to make something that works reasonably.
-- Mon Sep 3 08:49:45 2001: YG fixed some silly compile bugs :-/
-- vanillaHDL script and testbench added.
-- Sun Oct 7 05:39:23 2001: changed ROP2 function to MUX4
-- Mon Oct 8 01:39:45 2001: merged SELECT with the FANOUT loop.
--
--------------------------BEGIN-VHDL-LICENCE-----------------------------
-- This program is free software; you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation; either version 2 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program; if not, write to the Free Software
-- Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
---------------------------END-VHDL-LICENCE------------------------------
--
-- This is a first working and stable version for this unit.
-- It should be easily synthetizable but it is not attempted yet.
-- What matters most today is that it compiles and behaves correctly.
-- Warning : this code is and should remain purely combinatorial,
-- there is no latching here, it must be done at another level.
-- Furthermore, the function lookup table is now moved earlier
-- in the pipeline, in parallel with the Xbar cycle : look at the
-- f-cpu/vhdl/eu_rop2/rop2_xbar.vhdl file
-- The big fanout problems (propagation of the opcode from 1 to 64 bits)
-- overlaps the Xbar cycle so we can make a nice "signal tree".
-- Finally, only byte combines are possible yet. The COMBINE
-- instruction is still not completely specified in the manual.
--------------------------------------------------------------------------
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.all;
LIBRARY work;
USE work.FCPU_config.ALL;
Entity EU_ROP2 is
port(
ROP2_in_A,
ROP2_in_B,
ROP2_in_C : in F_VECTOR; -- the 3 operands
ROP2_function_bit0,
ROP2_function_bit1, -- pre-buffered boolean function bits
ROP2_function_bit2,
ROP2_function_bit3 : in Std_ulogic_vector((MAXSIZE *2) downto 0); -- fanout=4
ROP2_mode : in Std_ulogic_vector(1 downto 0); -- 2 function bits from the instruction
-- Combine_size : in Std_ulogic_vector(1 downto 0); -- unused ATM. Byte chuncks only.
ROP2_out : out F_VECTOR -- the result
);
end EU_ROP2;
Architecture arch1 of EU_ROP2 is
signal
partial_ROP,
partial_OR,
partial_AND,
partial_MUX : F_VECTOR; -- the partial results.
begin
--------------------------------------------------------------------------
-- ROP2 cycle : (combinational part only)
--------------------------------------------------------------------------
-- 1 : last fanout for the function bits + the ROP2 operator itself :
-- (the former loop is merged with the ROP2/MUX operation)
mux_loop : for i in UMAX-1 downto 0 generate
signal t: std_ulogic_vector(1 downto 0);
begin
t <= ROP2_in_A(i) & ROP2_in_B(i);
with t select
partial_ROP(i) <=
ROP2_function_bit0(i/4) when "00",
ROP2_function_bit1(i/4) when "10",
ROP2_function_bit2(i/4) when "01",
ROP2_function_bit3(i/4) when others; -- "11"
end generate mux_loop;
-- YG> i hope that this will be recognized as a MUX4 operator,
-- instead of the decomposed version used before (kept for
-- historical reasons) which is probably slower and heavier :
-- partial_ROP <=
-- ((not ROP2_in_A) and (not ROP2_in_B) and local_function_3)
-- or ((not ROP2_in_A) and ( ROP2_in_B) and local_function_2)
-- or (( ROP2_in_A) and (not ROP2_in_B) and local_function_1)
-- or (( ROP2_in_A) and ( ROP2_in_B) and local_function_0);
-- Btw, I have found an optimal nMOS circuit layout that performs this
-- function in one of Graham's books : "Introduction to nMOS and CMOS
-- Systems Design" by Amar Mukherjee, Prentice-Hall International Editions,
-- ISBN 0-13-490939-9, see pages 42 (fig. 3.13) and p170 (fig. 5.18).
-- However, it doesn't work for CMOS.
-- 2 bis : the MUX
partial_MUX <=
(ROP2_in_A and ( ROP2_in_C))
or (ROP2_in_B and (not ROP2_in_C));
-- 3 : partial ORs and ANDs on the byte chuncks :
BYTE_COMBINE : for i in MAXSIZE-1 downto 0 generate
partial_OR(8*i+7 downto 8*i) <= "11111111" when
partial_ROP(8*i+7 downto 8*i) /= "00000000"
else "00000000";
partial_AND(8*i+7 downto 8*i) <= "11111111" when
partial_ROP(8*i+7 downto 8*i) = "11111111"
else "00000000";
end generate BYTE_COMBINE;
-- YG> I'm still uncertain about the best way to write a multi-size version.
-- YG> Plus, the latency might explode the ROP2 unit's performance.
-- YG> So the multi-size version is dropped until it becomes necessary.
-- YG> Let's stick to plain bytes...
-- YG> Note : rop2.eps contains a trick to relieve the fanout (1->8) problem.
-- 4 : final selection stage :
with ROP2_mode select
ROP2_out <=
partial_ROP when ROP2_DIRECT_MODE,
partial_AND when ROP2_AND_MODE,
partial_OR when ROP2_OR_MODE,
partial_MUX when others; -- MUX
-- YG> warning : huge fanous ! 1->64 for 4 signals, i hope that the synthesiser
-- will generate the proper buffer tree.
end;