[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] F-CPU SoC (and F-GPU)



Le 2015-04-03 07:26, Nikolay Dimitrov a ÃcritÂ:
Hi Yann,
Hello !

I just checked and I have these boards :
http://www.microsemi.com/products/fpga-soc/design-resources/dev-kits/fusion/fusion-advanced-development-kit

Long story short : ASRAM is already soldered, only 32 bits wide but it's still pretty nice for this older generation, we'll feast on SDRAM with a different FPGA later :-)

I visited a trade show today and I was reminded of
the importance of the design tools. We might use various FPGA but the
development software should at least be totally free, very cheap to
implement, easy to hack and uniform for everybody. It's possible to
use FPGA and the necessary tools from the 4 vendors but this should
not be necessary for more than VHDL synthesis and bitstream
generation. So they should at least have a common emulation/debug
interface.
Sounds very reasonable. I remember seeing an open-source TAP controller
(probably on opencores), which can be used as such interface.

My current setup does not go through JTAG, it just uses a SPI interface
(from RPiB/+) and one GPIO.

In a 16-bits design, I just send 16 bits of addresses, then send or receive 16 bits of data. I let you extrapolate what it would be for a 32 bits system
and a 64-bits one :-)

I write portable code where the raw access is split from
the application logic. This logic is then easily ported to the YASEP
and others (for example my algo to flash SPI memories or my parser
for the hex files).
What's the actual issue with the rpi?
they changed their breakout connector from B to B+
(along with many cool things that actually create
 problems with backwards compatibility)
then they just changed the CPU (a "modern" quad-core).
I can live with recompiling for a different generation CPU
but the I/O registers are certainly totally thrown out of the closed window.

If you intend to share this rpi
helper code under an open license, we could probably take a look and help.
it will be published, yes, at least on the YASEP site.
there is already an older version there
http://ygdes.com/~whygee/yasep2014/C/src/
I have since enhanced that and found a stoooopeeeeed bug.

And can you get rid of the proprietary JTAG?
from what I know, it's possible but it's a lot of work.

For example ACME made the Colibri board,
http://www.acmesystems.it/COLIBRI250
an evolution of the FoxVHDL board that fits on their Fox CPU board.
The CPU board had a setting to boot in a special mode, sets up
a web server, waits for the bitstream to be uploaded (with a POST
HTTP request) and flashes the data.
This reuses already-published code, under the name DirectC,
so a MPU can reprogram the FPGA array. This is the kind
of ease of use I want and indeed I'm doing more and more over HTTP.

There are some
open-source JTAG interfaces, which could probably be used to replace it:
- BusBlaster: FTDI-based, has on-board CPLD to translate voltages
between host-target
(http://dangerousprototypes.com/docs/Bus_Blaster_v3_design_overview)
I avoid USB as much as possible.
I don't want to spend time managing the drivers for different platforms.

- BusPirate: PIC24-based universal hacking/debugging tool, there's a
firmware version for it supporting JTAG
(http://dangerousprototypes.com/docs/Bus_Pirate_v3.6)
PIC24 is another proprietary system.
Not the worst, I admit. But when you add USB, ... -_-

These are low-cost, several companies manufacture them, and are quite
hackable, so even if something doesn't work for our needs, nothing
prevents us from fixing it.
The JTAG interface will always be available, anyway, so others may use
the tools they have. But I don't want to continue using USB.
The world is slowly turning to Ethernet.

The address decoder can be done by the FPGA, why make it more complex
 than needed ? :-D
It's because you said that the Actel FPGA is IO limited, that's why I
mentioned the external address decoder.
See the link at the beginning, it contains better informations than
my bad memories :-)

The FPGA's pins have programmable driver strength.
I know this. The question is not only whether the single pin will handle the current, but whether the chip as a whole will handle the non-trivial
current of all the IOs toggling at the same time.

apparently the board has a 32 bits wide bus and 20 bits of address
so it "should" work. After all, it's made by the chip's manufacturer :-)

If we make a use case: 16 data lines + 24 address lines (that will
handle 32 MiB RAM), that's 40 IO lines, each handling lets say 20mA peak
current, which makes 800mA peak current across the chip's power supply
bonding wires. As the IO current increases, the chip GND starts to
bounce and to introduce noise in the other on-chip subsystems, which is
not nice. And it gets worse for wider interface. I'm not familiar with
Actel FPGAs, so can't say whether they're designed to handle such
currents across multiple IO pins, please share your experience.
according to the manufacturer, ground bounce is a problem for QFP parts,
due to longer wires. BGA has shorter connections to the ground and
is said to be immune (I don't remember which page says it but i remember
that it was addressed by Actel at the time)

For the SRAMs, I have a wide array of asynchronous and synchronous
chips. For example, hundreds of 512KB synchronous SRAM (256K*16) that
run at 133MHz, some chips in my collection run even faster
(2MB@200MHz) but they're significantly more expensive so I only have
small quantities.
Yep, the faster ICs should have lower parasitic capacitance as they're
designed to run with higher clock speeds. If we use several such ICs in
parallel, this could be a working solution for us.

I was thinking about this issue and my opinion is that it would be
better to use the most narrow (8-bit?) SRAMs with the highest possible
capacity and to connect their address lines in parallel. As the system
RAM address lines are uni-directional, it would be easier to put there
low-delay buffers/amplifiers, in order to increase the drive strength.

drive strength goes up to 24mA with/out emphasis, but adding a buffer
on the address bus loses a nanosecond or two...
I wouldn't load the address bus beyond 4 (with a good layout).

For now, there is a couple of CY7C1061AV33-10ZXI,
4MB is not much but there is enough room to implement some caches.

The backside of the board has other chips for Flash, I wonder why,
because there is already 1MB of onchip superfast Flash.

Aww, crashing is a totally unrelated feature, which we can implement
even on FLASH-based storage :D, hehe.
but you'd have to copy the image to RAM, which is alterable :-P

For F-CPU, there is a need of a fast 64-bits wide data bus.
I totally agree. If we take for example 50MHz as the SoC bus clock,
this will allow 381 MiB/s peak bandwidth,
How do you obtain this curious number ?

Well, by counting on my fingers :D. 64-bit SDR bus (I haven't seen FPGA
architecture supporting on-chip DDR clocking)
A3P and Fusion have DDR and differential I/Os :-)
with proper layout, it can run up to 700Mbps per pin.
The newer Igloo2/SmartFusion2 goes even faster,
I don't remember.

handles 8 bytes per clock.
At 50 MHz the on-chip bus will have theoretical peak bandwidth of:

50e6 * 8 / 2^20 = 381.469 MiB/s
why would you divide by 2^20 ?
why use MiB instead of MB ? we don't work with MiHz :-D

A -10 grade is 10ns access time, plus all kinds of propagation times,
leading to a 64 or 66MHz cycle time or 256MB/s.

That board was dimensioned for an ARM softcore, not a 64-bits SIMD beast.
But at least I got them. And it makes me consider even more
a scaled-down 32-bits subset of the F-CPU to get our feet wet.
If it can do more, it can also do less :-D

By the way, I missed one thing - do you propose using a multiplexed
system bus, or one with separate address/data lines?
No multiplexed bus, please :-D
Ideally, a direct connexion from the FPGA to the RAM chips.

We have lost some time already trying to support several VHDL
simulators and it did dumb the source code down. This is solved now
thanks to GHDL.
Well, as I said - if someone is motivated to port the code to his
favorite FPGA, let her/him do it.
There would be no reason not to :-)

It's responsibility of this maintainer
to support his port when patches are flying upwards and downwards. If
there are generic fixes/changes, he can propose a patch to the parent
maintainer (you?) and negotiate. We're doing this stuff on daily basis
for years for open-source and commercial projects and it works. The
upstream developers (you?) are not forced to accept patches which will
break things or change the design into undesired direction. So it's
win-win: be open, but practical.
I hope I am :-)

I'm pushing hard in one direction, hoping to get actual, functional results.
I've had time to play with the YASEP and I've seen things that work and
what doesn't. I hope it helps F-CPU the same way.

Have you seen the Papilio boards (http://papilio.cc/)? There are several
different designs, but all of them are open-hardware.
From what I can see there,
http://papilio.cc/index.php?n=Papilio.Hardware
they are more suited for the YASEP.

Btw, the Papilio designs are using FTDI-based on-board JTAG (just like
the BusBlaster, but cheaper). This JTAG is supported quite well with
Linux/Windows (don't know about *BSD).
The Xilinx chips are programmed a bit differently, you could just
to program a SPI chip and reboot, instead of JTAG to the fabric.

Actel/Microsemi's Flash based arrays workdifferently, it's slower to program too...
I can deal with it for YASEP development but for larger designs,
I understand that people will prefer SRAM-based arrays from
Xilinx, Altera or Lattice. That's why the guest/application FPGA
will be on a module (yet to be defined).

If someone has all the
tools for Altera, Lattice or Xilinx, it's great ! I can't use all of
them, it takes too much time...
(and money, I forgot)

Absolutely. That's the idea for the community - everyone adds value. If
you don't like Xilinx or Altera, and/or don't have tools/experience to
work with them, some other buddy can port the design to some low-cost
board and even provide a synthesized bitstream on regular basis for the
community members. That would be nice!
We can all hope :-D
but I'm too old for this ;-)
Meanwhile, I'm trying to progress.

Also, having multiple maintainers with Actel/Xilinx/Altera/etc
experience will definitely help to keep the code generic (remember how
much times have you seen a open-source Verilog or VHDL design
instantiating vendor-specific IP blocks?).
the same problem appeared with the support of several VHDL simulators,
because of inconsistent support and interpretations of the LRM...
I believe in testing code on as many platforms as possible but
 - when it comes to HW, it becomes unbearably expensive
- it's also unbelievably time consuming and halts actual development :-/

For now :
 - I have several AFS1500 boards to play a bit. The external SRAM is not
    huge but there are enough gates for a non-trivial CPU, see
http://www.microsemi.com/products/fpga-soc/fpga/fusion#product-tables 38K 3-input gates, 1MiB of high-speed 32-bits Flash (for code or lookups),
    30KiB of dual-access SRAM (split in 60 blocks, some will be used for
    the register set etc.)

      Of course, it would be nice to have simulable VHDL source code
in the first place, right ? but at least we have a first hardware target.
      AFS1500 is enough for some design space exploration. FPGA change
      a lot of rules in architecture/design, F-CPU was thought for ASIC
      and I'll have to adapt many things...

 - the host/base system will start with the AFS600, implementing YASEP.
    It is on my roadmap for a very long time so i know how to get there.
    The slow parts are as usual the software... The JavaScript system
    needs a big update !

 - When everything works, it's possible to switch to more powerful
    and cost-effective chips so the source code must be portable
(this is the same prerogative as 15 years ago, with added hardware twists).
    The Igloo2 will be available in QFP144 for about $20 so it's very
    well suited for the base system. Modules could use a SmartFusion2
    because its internal configuration can be upgraded with a basic
    serial port, no need of a JTAG dongle or dongle at all :-)

That's all for now. Let's sleep now.
yg
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/