[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] X-Bar replacement and PoC of massiv-parallel-computing, hints?



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

In a lecture about uController I had the idea of following architecture:


Controller <-> RAM
    |
 +--+--+-----+-----+ ...
 |     |     |     |
ALU1  ALU2  ALU3  ALU4 

Every ALU is very simple, it has 2 connections to a fast 2-wired serial 
bus and 8 lines to hardwire the ALU-adress. Every ALU has a serial bus
de/encoder, a queue/register-set of 16 entries with static length and
contains basic functions like "add", "shift"/"ror"/"ashift", "xor", "and",
"or", "not" on 8-bit basis.

The controller delivers tasks to ALUs, there will be the tasks executed in
a queue and results stored in queue. The controller has an Init, which
checks which ALUs are on bus, it decodes instruction from RAM, check
which ALU is jobless and put the queue. Than it asks if an ALU is
ready, if so it delivers next tasks. There is a cycle through all ALUs
on bus.
The signals on serial bus are controlled by controller and are:

"getAdress": to get the ALUs-Adress and store in abuffer if it exists or
not
"getQueue": ALU should deliver their queue
"getReady": is ALU ready? Is Queue done?
"putQueue": deliver Queue to ALU
"Reset": make clear
"setSleep": ALU should sleep/wakeup now

The structure of the queue-entry (32bit) is:
ID (8bit: the same as hardwired ALUs-Adress)
C0N (3bit: Carry, Zero, Negation-flag of result)
OPCODE (21bit: operation + data, pE. "add": 3bit->operation,
8+1-Bit->Data1, 8+1-Bit->Data2)

the controller hold the tasks with an 16-bit id builded from ALU-adress
and queue-id. On serial bus can be 256 ALUs, if the timing on bus is
critical, the queue-size on ALUs should be enlarged.

Through many simple, cheap ALUs it is possible to do a parallel sort,
multiplication (parallel addition), speculative calculation of branches
and so on, if there less the controller needs longer to calculate
something (to collect all results)

advantages:

- - simple ALUs
- - scalable
- - simple 2-wired bus-system is not so critical as a parallel bus or a
x-bar-system
- - easy extensible

disadvantages:

- - complex controller and compiler
- - timing of serial-bus



Is this an extensible concept?

Can we use the concept of 2-wired bus with serial de/encoder on every unit
in f-cpu instead x-bar? 
Okay on the one hand, every unit is enlarged by
serial de/encoder, on the other hand there is no problem with depth of
logic-cells, path-ways and so on.

For serial de/encoder we need a shiftregister plus less than more logic...

Any hints?

Bye Andreas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE7wgHMClxplZklbgERAqkPAJ43GDkm8uJtGLQhpPV1uAxl5YwgJgCcCg9B
t1vIA0MXCQOLIeyAxjhEae0=
=g5aq
-----END PGP SIGNATURE-----

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/