[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Re: FC0 XBAR



On Thu, 2 Aug 2001, Yann Guidon wrote:

> ho and welcome in the list,

Hi, thanx!
 
> > > Juergen Goeritz wrote:
> > > > On Wed, 1 Aug 2001, Yann Guidon wrote:
> > > >
> > > > Hi !
> > > > > > Don't you do a strictly synchronous design of all the control
> > > > > > signals?
> > > > >
> > > > > it's fully synchronous, but i have no safe evidence that the
> > > > > critical datapath is short enough.
> > > >
> > > > Evidence will only come together with a real synthesis and
> > > > the layout. It depends where you locate the blocks on the
> > > > chip. The new technologies like 0.18 may introduce additional
> > > > delays on signals when the way between source and drain gets
> > > > longer.
> > >
> > > i am aware of this. I have no idea of how the synthesiser will
> > > modify the datapath, it could end up worse than what is written.
> > > i know that there are some nasty surprises awaiting us.
> > 
> > You will get different surprises with every technology.
> > FPGA for example require completely different synthesis
> > compared to the ASIC technologies. Thus you may get
> > problems where you don't expect them to arise. If you
> > look at LEON there is a fast adress generator added for
> > FPGAs to do it in a single cycle still.
> 
> my worry is that compilers sometimes want to play the "smart ass"
> and optimize things that should be left.

Sometimes you can tell them to not optimize. Or you can
add a macro to the implementation library that you do
in hand coding style. Is that what you want? Don't think
so because then you have to take all the 'implementation
issues' with you - like LEON does today.

> > > > > > > However, i don't know if it is possible to read and write the
> > > > > > > same register in a single clock cycle.
> > > > > > That should work, because LEON has a similar strategy for read
> > > > > > and write of the registers, one read port and one write port.
> > > > > ?
> > > >
> > > > The SPARC architecture describes a register set that provides
> > > > read and write to the register bank in a single cycle since
> > > > its a pipelined architecture capable to execute an instruction
> > > > each cycle.
> > >
> > > now the problem is when the written register is the same as the read register.
> > > gut feeling tells me that the signal couldn't propagate fast enough.
> > > some kind of bypass could become necessary.
> > 
> > Yes, but write usually is at least one cycle delayed, isn't it?
> it depends, but it doesn't solve the problem : the delay only "moves"
> the problem from one cycle to another...

You could use a dual port RAM. Those are usually capable of
two accesses in a single clock cycle. This gets more of a
problem if two ports are not enough though.

> > So the compiler could take care of not using the same register
> > as source in the next instruction(s) that must be written to
> > first. But this implies that you know exactly how the pipeline
> > is constructed and how it works inside the compiler.
> this is not considered : the compiler and the binaries should
> be independent from the microarchitecture (at least, have a minimal
> compatibility). introducing such a constraint on the compiler is
> not desirable and not easy either...

Didn't you tell me before that the compiler has to take
special care of the architecture or was it just for vector
processing?

> we CAN detect when the bank is accessed both for read and write.
> we can even delay the instruction that does that (but it's not desirable).
> however, on some cases it might be that the hardware doesn't need
> such a measure. it depends too much on the silicon characteristics...

It seems to be only necessary in the case of register clash,
i.e. same register used for read and write the same clock. In
this case it could be possible - just thinking - to do a wrap
from write to read data, so that the register is only accessed
for write but the write data is also passed to the read bus?

> > > i'll analyse this issue when C will be transformed into VHDL.
> > Do you simulate the register accesses as well?
> i start from the idea that the hardware CAN read and write the same data.
> it is easier to handle because there is no special case.
> i could then add a 1-cycle write latency feature, in the future,
> but it makes the instruction decode/issue more complex...

Me won't go for latency - me want it simple :-)

> > > > > > > i think that the register
> > > > > > > bank will be "generated" with some design tools, but there might
> > > > > > > be a problem here.
> > > > > > Usually yes. Some macro of the library of the target chip.
> > > > > can you explain ?
> > > >
> > > > You take some macros from the vendor libraries that are
> > > > optimized for speed and area. You do not synthesize
> > > > regular structures because it will waste a lot of space.
> > > > You could also add some (S)(D)RAM structures onto the chip
> > > > (but dynamic RAM not with all technologies). Some Ethernet
> > > > switch devices have up to several megabyte directly onchip
> > > > as packet buffers. LEON is an example on how you do it.
> > >
> > > usually, this kind of "hard macro" is specified and characterised.
> > > but this changes from vendor to vendor, so the question is still
> > > annoying.
> > 
> > The question?
> the question whether the macro will allow simultaneous read and write
> to the same register(s).

-> dual port access memory.

> > If you look at the plot from an ASIC the area
> > with synthesized structures are not as much filled as the
> > hard macro areas.
> that's almost obvious.

It may come from some grid/raster they use in the ASIC
and with all the wiring they can't make use of every
grid. And imagine crosstalk and whatever. It may be a
science of its own. But that's just a guess ;-)

Talk to me
JG

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/