[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] F-CPU SoC (and F-GPU)



Hi Yann,

On 04/03/2015 03:32 AM, whygee@xxxxxxxxx wrote:
Le 2015-04-02 13:47, Nikolay Dimitrov a Ãcrit :
Hi Yann,
Hi again ! I'm catching up on my emails but you'll find a lot of
agreements in this message :-D

I was also thinking about using YASEP as a "helper" - it can
provide quite handy functionality, including remote debugging and
host-assisted I/O. It could be probably very handy for preparing
the system memory before the F-CPU boots (and thus removing the
need for boot-rom), so system developers can easily develop and
test the bootloader.

If there are not much resources in the FPGA, the "helper" can be
YASEP on another board, or just any rpi/rpi2/riotboard/beaglebone.

That's the spirit :-) Indeed the YASEP and the F-CPU together create
a system that is reminiscent of the CDC6600, with the bulk of
processing and the applications run by the central system (it was a
60-bits system with several parallel units and a scoreboard, what a
coincidence) and the I/Os, boot, debug, scheduling and all IRQs
managed by the PPU (Peripheral Processing Unit, with funny
similarities to the YASEP).

A system with YASEP and F-CPU will have a different organisation, the
F-CPU would have its own scheduler and there could be several of
them, the YASEP can handle more sophisticated tasks (it was initially
designed for managing a small embedded system with some protocols in
real time). But they are complementary, and it's only normal that
they share as much as possible. So now I'm trying to merge things and
bring new stuff to the F-CPU.

I visited a trade show today
http://www.salons-solutions-electroniques.com/ and I was reminded of
the importance of the design tools. We might use various FPGA but the
development software should at least be totally free, very cheap to
implement, easy to hack and uniform for everybody. It's possible to
use FPGA and the necessary tools from the 4 vendors but this should
not be necessary for more than VHDL synthesis and bitstream
generation. So they should at least have a common emulation/debug
interface.

Sounds very reasonable. I remember seeing an open-source TAP controller
(probably on opencores), which can be used as such interface.

So far I use a RPi B or B+ to access my design (there is also the
JTAG port but a dedicated, proprietary probe is used to reflash the
FPGA) but the new Raspberry 2 breaks all my code. RPi is "cheap" and
"widespread" but they break their user base every now and then as
they wish. It's a good platform to get people started but it's not a
stable base. I write portable code where the raw access is split from
the application logic. This logic is then easily ported to the YASEP
and others (for example my algo to flash SPI memories or my parser
for the hex files).

What's the actual issue with the rpi? If you intend to share this rpi
helper code under an open license, we could probably take a look and
help. And can you get rid of the proprietary JTAG? There are some
open-source JTAG interfaces, which could probably be used to replace it:
- BusBlaster: FTDI-based, has on-board CPLD to translate voltages
between host-target
(http://dangerousprototypes.com/docs/Bus_Blaster_v3_design_overview)
- BusPirate: PIC24-based universal hacking/debugging tool, there's a
firmware version for it supporting JTAG
(http://dangerousprototypes.com/docs/Bus_Pirate_v3.6)

These are low-cost, several companies manufacture them, and are quite
hackable, so even if something doesn't work for our needs, nothing
prevents us from fixing it.

The AFS1500 is a little beast so I bought several boards for
workshops to present the YASEP. But there is room for a
reasonable F-CPU (though the internal wire delays won't allow it
to run fast) There are enough I/O pins so an external memory
board could be hacked (I have a bunch of SRAMs).
Love the idea of SRAMs. But there's a practical limit of how much
SRAM you can put on the system by adding more chips. There are 2
big issues: - adding address decoder for multiple SRAM banks will
introduce delay
The address decoder can be done by the FPGA, why make it more complex
 than needed ? :-D

It's because you said that the Actel FPGA is IO limited, that's why I
mentioned the external address decoder.

- having multiple SRAM banks means much higher capacitance on the
data lines, which will load to both FPGA and SRAM IO drivers.
Probably the FPGA is a tough beast and can source lots of milliamps
to charge the parasitic capacitance, but not sure about the SRAM IO
drivers. This needs to be verified.
The FPGA's pins have programmable driver strength.

I know this. The question is not only whether the single pin will handle
the current, but whether the chip as a whole will handle the non-trivial
current of all the IOs toggling at the same time.

If we make a use case: 16 data lines + 24 address lines (that will
handle 32 MiB RAM), that's 40 IO lines, each handling lets say 20mA peak
current, which makes 800mA peak current across the chip's power supply
bonding wires. As the IO current increases, the chip GND starts to
bounce and to introduce noise in the other on-chip subsystems, which is
not nice. And it gets worse for wider interface. I'm not familiar with
Actel FPGAs, so can't say whether they're designed to handle such
currents across multiple IO pins, please share your experience.

For the SRAMs, I have a wide array of asynchronous and synchronous
chips. For example, hundreds of 512KB synchronous SRAM (256K*16) that
run at 133MHz, some chips in my collection run even faster
(2MB@200MHz) but they're significantly more expensive so I only have
small quantities.

Yep, the faster ICs should have lower parasitic capacitance as they're
designed to run with higher clock speeds. If we use several such ICs in
parallel, this could be a working solution for us.

I was thinking about this issue and my opinion is that it would be
better to use the most narrow (8-bit?) SRAMs with the highest possible
capacity and to connect their address lines in parallel. As the system
RAM address lines are uni-directional, it would be easier to put there
low-delay buffers/amplifiers, in order to increase the drive strength.

But performance is not yet a problem, as long as the F-CPU can't
even decode an instruction in VHDL...

The internal Flash is 1MB and can feed a RISC CPU's instruction
decoder at 100MHz
This could be nice for a boot-rom, but not sure that this is an
actual advantage for a general purpose CPU. Unless we make
something like a modern Amiga :D.
This chip is aimed at embedded systems, Flash bits use less space
than SRAM and it could be used for other purposes, like huge lookup
tables :-) Anyway it is a good place to store a real-time kernel. it
would be un-crashable :-)

Aww, crashing is a totally unrelated feature, which we can implement
even on FLASH-based storage :D, hehe.

For F-CPU, there is a need of a fast 64-bits wide data bus.
I totally agree. If we take for example 50MHz as the SoC bus clock,
this will allow 381 MiB/s peak bandwidth,
How do you obtain this curious number ?

Well, by counting on my fingers :D. 64-bit SDR bus (I haven't seen FPGA
architecture supporting on-chip DDR clocking) handles 8 bytes per clock.
At 50 MHz the on-chip bus will have theoretical peak bandwidth of:

50e6 * 8 / 2^20 = 381.469 MiB/s

By the way, I missed one thing - do you propose using a multiplexed
system bus, or one with separate address/data lines?

which is definitely cool to start with. Unfortunately wide buses
are expensive (routing), and also address decoders for wide buses
are expensive (delay),
decoders are in the FPGA ;-)

Please revisit my explanation about the external SRAMs.

so would be good to move the low-speed IP cores behind a bus
bridge, and leave only high-speed IP cores on the wide bus.
we're not bound by design features found in other platforms like
PCs. Slow peripherals can have their own interface, some devices are
mapped to the Special Registers (IRQ controller, DMA, console/debug
serial port...) No need of a bridge, because the wide RAM bus should
be used only for RAM.

What do you mean by "wide"?

So I would just leave it as is - let the guy who is porting
the F-CPU SoC to this specific board to take care of the board
specifics and to document which SoC signal goes where, and
that should work OK.
yup, however for our team's progress, we need a uniform platform
because otherwise we'd lose a lot of time dealing with each
person's board details...

I have slightly different opinion on this topic. The project is
about the freedom of choice, so let's not limit people.
I agree. However the project can't do everything as well. either we
support all the brand names and all the models, or we design a CPU.
We have lost some time already trying to support several VHDL
simulators and it did dumb the source code down. This is solved now
thanks to GHDL.

Well, as I said - if someone is motivated to port the code to his
favorite FPGA, let her/him do it. It's responsibility of this maintainer
to support his port when patches are flying upwards and downwards. If
there are generic fixes/changes, he can propose a patch to the parent
maintainer (you?) and negotiate. We're doing this stuff on daily basis
for years for open-source and commercial projects and it works. The
upstream developers (you?) are not forced to accept patches which will
break things or change the design into undesired direction. So it's
win-win: be open, but practical.

I hope a similar situation does not arise with the target board and
we'd rather use an open source design, but none exists with our
requirements.

yet.

Have you seen the Papilio boards (http://papilio.cc/)? There are several
different designs, but all of them are open-hardware.

Btw, the Papilio designs are using FTDI-based on-board JTAG (just like
the BusBlaster, but cheaper). This JTAG is supported quite well with
Linux/Windows (don't know about *BSD).

It's a matter of arranging the work flow. You seem to be fine to
take care of the mainline version of F-CPU, which is being
developed and tested against what you have at hand (the Actel
stuff). There could be another developer, who's responsible for
let's say Xilinx port of the SoC, and his task will be to maintain
the Xilinx port, like merging updates from your mainline code,
accepting/maintaing patches from other guys using Xilinx boards,
and proposing back generic patches to the mainline. If you have
experience with any major open-source project work-flow, this would
be the same.
Right. I'm stepping up in this role now, because I have the means and
 experience now, but all help is welcome. If someone has all the
tools for Altera, Lattice or Xilinx, it's great ! I can't use all of
them, it takes too much time...

Absolutely. That's the idea for the community - everyone adds value. If
you don't like Xilinx or Altera, and/or don't have tools/experience to
work with them, some other buddy can port the design to some low-cost
board and even provide a synthesized bitstream on regular basis for the
community members. That would be nice!

Also, having multiple maintainers with Actel/Xilinx/Altera/etc
experience will definitely help to keep the code generic (remember how
much times have you seen a open-source Verilog or VHDL design
instantiating vendor-specific IP blocks?).

So my proposal for F-GPU is to prototype it then build 8 or 12
of them for the team's most active developers. The cost would be
bearable and the risk is low because the F-GPU is repurposable.
Yes.
another invaluable benefit is the control we'd have over parts
sourcing, manufacture and all the details that go into making a
decent PCB.

yg

Regards,
Nikolay
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/