[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] Interim solution to prototype the core (and some register set and DDR design considerations)



Le 2015-04-12 18:22, Nikolay Dimitrov a ÃcritÂ:
Hi Yann,

Hello Nikolay,

What do you guys think ?
Well, I'm not an expert on Actel toys,
No problem with that.

I'm very familiar with the precedent generation (ProASIC3),
which I currently use for customer projects, and I'm trying
to catch up with the new chips (which blow my mind, it will
take a while to get used to it).

but this one seems nice (hehe,
it's hard to believe this words are coming from a Xilinx-guy :D).
So it's a good thing :-D

If you find similar Xilinx modules, please let us know.

I just have some practical questions:

1. Is this chip supported by a free version of the Actel tools?

AFAIK, the largest chip supported by the free tools is the 050,
with about 50K LUT4, which is one of the two available versions
of the module. The other version is a smaller 10K LUT version
but the FC0 requires a lot of SRAM blocks (for associative arrays,
multiported register set, and the Fetcher and LSU work as Level-0
memory cache).

one of the many features that excites me with the new generation
(Igloo2 is just a SmartFusion2 with a few hard blocks disabled,
including the ARM core) is the modified SRAM block granularity :
 - the precedent generation (such as the ProASIC3 and Fusion chips
     on my proto boards) has 512 bytes of dual ported SRAM blocks
     (also FIFO capable) and you must do everything with that.
 - the new has 2KB blocks (still dual ported and FIFO capable) as well
     as smaller and faster 2R1W 64-entries "microSRAM".

Oh wait.

The F-CPU has 64 registers, do you believe in coincidences ?
You'd need only four of these microSRAMs to implement a basic F-CPU core.
more for the advanced 2R2W or 3R1W instructions but it's something
I'm figuring out. The microSRAM is almost a direct match to F-CPU,
how cool is that ?

The Xilinx and Lattice chips have a patented feature that turns a LUT16
into a 1x16 bits SRAM cell, which is VERY handy for register sets and
fine granularity stuff (which is used and abused of when designing FC0
because I was targeting ASIC).

For 64 entries it requires 4 LUT16 for storage and more for multiplexing. [BTW this assumes a single-threaded F-CPU, a SMT with 4 threads would need more]
For the 2-write capability, I consider a multi-banked register set,
with 4 banks of 16 entries, and the bank is selected by the register
address' LSB (so write conflicts are trivial to detect and compilers
don't have crazy rules to respect for scheduling)
The 3R means this is replicated 3 times so for each bit,
the 5+ LUT16 turn into 15+ LUT16.
My worry at this point is the fanout and wire length and count
for the register write ports...

This is why I want to "play" with a subset of FC0 that
only supporst 1W2R or 1W3R.


SMT would be pretty handy. 2 simultaneous threads would
help boosting units usage and shadow lost cycles, 4 threads
would be better and it would also miror the 4-banked
2-writes register set. So if we have 1W2R core, we can use
the 64-entries microSRAMs, but a 2W capabe core would
use only 16 entries (and the 4 simultaneous outputs must
be multiplexed, adding wire delays). The remaining 48 entries
can be used for other threads, or cache for context swithes
or things like that.

So far, FC0 is not SMT able, it was designed for single-thread
performance, and the 4 register sets will be used in more
trivial ways, such as shadow loading of contexts from a different
thread for example. Pretty handy for debug ;-)


2. Is there a Linux version of these tools?

There should. I haven't gone as far as using it lately,
though I'm sure they have progressed a lot since I last tried.

I stick to the Windows version because "it works" (that's the
least I ask for such a critical piece of code) and the older tools
have very useful features that were removed from the last
generation of tools, and they are important for my work.

But yes, i should investigate that again.

Did Xilinx cleanup their tools lately ?
The last times I tried it was still a Stephen King-kind of horror,
and honestly it's the reason why I don't use their chips.

I had a lucky try with SiliconBlue, before they were acquired by Lattice
so it's a manufacturer I consider. Since SB started from scratch,
the tool was lean, it worked without too many hassles on linux,
and I hope Lattice didn't mess with it too much.

3. Is it possible to instantiate and use the DDR controller with plain
Verilog/VHDL, without any proprietary stuff (code generators, encrypted
netlists, etc)?

Encryption of Actel/Microsemi devices is controlled by the user.
Actel even recently turned off encryption of their own softcores.
3rd parties can still provide encrypted netlists to be added to user designs
but it's irrelevant to our case.

The SF2's DDR controller is a "hard block" that should be seen as a VHDL/verilog instance (i should check). Microsemi certainly provides a "code generator interface" (I haven't used it with the SmartFusion2 yet), just like it provides code generation
"wizards" for the other chips.

For example, on the older tools, you have "pretty" windows
to configure your RAM arrays (sizes, widths, ports properties...)
But it generates (ugly) VHDL that instantiates a VHDL entity from
their proprietary library. Their library is fairly well documented
and the interface/pins can be used directly. I routinely cleanup
the generated code, and tweak their wrappers by hand, but
it would work without.

Since the proprietary/hard block is documented (and at worst could
be "reversed" from the generated VHDL wrappers),
it's easy to abstract it and wrap it in a portable way.
Normally, F-CPU would have a more or less generic DRAM
interface with target-specific implementations.

- the SF2 "architecture" would be "just" one wrapper for the hard block.

 - A pure VHDL version would also be written and simulated with GHDL.

 - Sebastien Bourdeauducq made MilkyMist with a Xilinx chip and uses
   high performance DDR with optimised custom soft controllers
   so we could try to collaborate with him on this subject,
   though he uses Verilog :-/

 - Xilinx raised the bar with the 7th generation chips and I guess
   they provide similar hard blocks.

 - Altera and Lattice have their own solutions too...

Note that, for performance reasons, the memory interface should
expose as many properties of the underlying controller as possible, unlike
"generic" wishbone or AMBA. The "Fetcher" and "LSU" are keys
to the core's performance because they adapt the Load/Store instructions
to the attached memory chips, without losing precious cycles
on a generic 32 bits controller.

As a reminder, FC0 was designed for the SDRAM chips of the y2k era,
with their 4 banks. The number of DRAM banks should be matched
with a same number of lines inside the LSU and fetcher for optimal
use of interleaving etc.

4. Does it have something like DLL/PLL, in order to create all needed
internal clocks from the 12 MHz reference clock?

how could it not ? :-)

more info at
http://www.microsemi.com/products/fpga-soc/soc-fpga/smartfusion2

oh and they (finally) got "math blocks" and other attributes of the competitors.

Regards,
Nikolay
yg
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/