[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [f-cpu] I'm still in the warmup phase ;-)



hi,

Beat Steiner wrote:

I'm speaking German too - but let's stick to English to avoid excluding others.
yup, it's an english mailing list.
and the german list is "dead" and non-technical.

should get a simple design running before adding fancy engines.
we've been there before :-)

I can imagine very well -- LOL
well, just have a look at the early mailing list archives and then tell
us if it corresponds to what you imagined :-)


We already have such an engine, sort of:

# load parameters...
loadcons $0x7770632d30303066, r1 # that's "f000-cpu"
loadcons $0x7070707070707070, r2 # that's 8 times a 'p'
# and here we go:
xnor.and.b r1, r2, r3
# matching bytes are 0xff, others 0x00
notice the single-cycle operation :-P
Hmmm... Exnor will make equal bits 1 and unequal bits 0 that's well known.
So "and.b" is an FC0 specific feature ANDing all bits in their respective byte?
right !
look at http://f-cpu.seul.org/new/19c3-presentation.pdf (page 11 to 17)

Looks great! So in order to check whether a needle is in a haystack
i'll just race throught the memory and compare the xnor.and.b to zero, 8 chars
a bunch. If nonzero, locate it precisely. If we use this in strncmp-like functions,
we'll run databases and such pretty fast. WOW!
well, in database, the largest loss of time is in fetching data.
so it won't help unless you have a huge cache.

I still have to get used again to that Intel-like opcode stuff and reverse-reading
the bytes in the code -- the good old Endian issues. You alredy agreed on an
opcode for the FC0?
endianness of the opcode has been defined long ago : it is the order of
least effort when writing an assembling program.
look at the latest version of the manual
http://f-cpu.seul.org/cedric/unstable/F-CPU_manual-0.2.7c-en-color.pdf
(page 74)

however, the portability problem is very critical.

[...] The written VHDL code
will need many modifications before running on another board, let
alone another family of FPGA, and not even considering another brand.
Ooops! Sounds like we need an abstraction layer.
VHDL is already a good abstraction layer, and certainly the most suitable.

And implies that if we get it running on CPLDs, we can't use the same VHDL code for the ASIC -- Horrible!
in fact the problem lies with performance and fitting the features in the devices.
the laws of electronics say that the more optimised, the more device-dependent
the implementation and the less portable the enhancements.
our current code defines a platform-agnostic algorithm that is more or
less targetted at ASICs, so it may be painful to use it for FPGA.
However, VHDL allows a certain block to be written in different ways,
and select the most suitable version at the last stage, so or example,
the adder can be replaced by a platform-specific implementation of
this unit, as long as the interface is the same.

I don't know VHDL yet. I thought it's a connect-this-to-that language that
is independent of the physical layout. Or was that something SPICE-like?
well, get a few books about this langage, it's not a superficial langage and
requires some efforts, but it's really worth it.

>Do we need more pseudo-registers for memory access?

>what do you mean here ?

That the CPU core looks at the memory through a "window" that looks like
two registers. This can significantly reduce opcode space.
F-CPU doesn't use "windows".
it uses plain load/store operations.
so you can have up to 63 valid pointers if you want.

Example: r60 and r61 are the memory access window. Load the address into r60.
If we read from r61, we read the memory at the address given in r60. Write works
in a similar way.
just like in the CDC6600 :-P
http://f-cpu.seul.org/whygee/CDC/Grishman_CDC6000AsmLangPgmg.pdf
http://f-cpu.seul.org/whygee/CDC/DesignOfAComputer_CDC6600.pdf
(my best recommended read for any CPU-designer-wannabe)

Instead of
r01 := [0x00000ha150373135]
we issue
r60 := 0x00000ha150373135
r01 := r61
Disadvantage: more difficult to see "vom Schiff aus" which MOVE commands use
many clock cycles.
We will need at least 2 (better 3 or 4) windows of that kind, sacrifying up to 8 registers.
F-CPU doesn't use this scheme, though it could be adapted.
in fact the memory access mechanisms work more or less with that in mind
BUT without sacrificing fixed registers. The cost is that there must be a mechanism
for "associating" a pointer-register to an entry in the load/store unit ("LSU").


* The majority of computer users needs rather data movers than computers (i.e. machines performing calculations).

>but did you hear about the "memory bandwidth barreer" ?
>F-CPU is designed to be good at computing, it's a fact.
>[...]
Yep. Maybe we later add an unit outside the FC0 doing
memory block moves, avoiding occupying the CPU's
memory bus. I think the AMIGA had such an engine.
memory block moves are not the problem.
handling different virtual memory spcaes (-> page misses), IRQs and cache misses,
is a much heavier burden that simple/straight DMA
(where only a few registers and a counter are needed).

YG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/